要优化Linux环境部署Llama 3,可以参考以下步骤和建议:
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl edit ollama.service
在 [Service]
部分添加:
Environment="OLLAMA_MODELS=/home//ollama_models"
sudo systemctl daemon-reload
sudo systemctl restart ollama
ollama run llama3:8b
git clone -b v0.8.0 https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
mkdir -p /root/model_repository
git lfs install
cd /root/model_repository
git clone https://www.modelscope.cn/pooka74/LLaMA3-8B-Chat-Chinese.git
docker run --rm --runtime=nvidia --gpus all -p 8800:8000 --volume ${PWD}:/TensorRT-LLM -v /root/model_repository/:/model_repository --entrypoint /bin/bash -it --workdir /TensorRT-LLM nvidia/cuda:12.1.0-devel-ubuntu22.04-trt-env
pip3 install tensorrt_llm==0.8.0 -U --extra-index-url https://pypi.nvidia.com
python3 examples/llama/convert_checkpoint.py --model_dir /model_repository/LLaMA3-8B-Chat-Chinese --output_dir ./tllm_checkpoint_1
通过这些步骤,您可以在Linux环境中高效地部署和优化Llama 3模型,以满足不同的性能和功能需求。