要优化Linux环境部署Llama 3,可以参考以下步骤和建议:
curl -fsSL https://ollama.com/install.sh | shsudo systemctl edit ollama.service在 [Service] 部分添加:
Environment="OLLAMA_MODELS=/home//ollama_models" sudo systemctl daemon-reload
sudo systemctl restart ollamaollama run llama3:8bgit clone -b v0.8.0 https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLMmkdir -p /root/model_repository
git lfs install
cd /root/model_repository
git clone https://www.modelscope.cn/pooka74/LLaMA3-8B-Chat-Chinese.gitdocker run --rm --runtime=nvidia --gpus all -p 8800:8000 --volume ${PWD}:/TensorRT-LLM -v /root/model_repository/:/model_repository --entrypoint /bin/bash -it --workdir /TensorRT-LLM nvidia/cuda:12.1.0-devel-ubuntu22.04-trt-envpip3 install tensorrt_llm==0.8.0 -U --extra-index-url https://pypi.nvidia.com
python3 examples/llama/convert_checkpoint.py --model_dir /model_repository/LLaMA3-8B-Chat-Chinese --output_dir ./tllm_checkpoint_1通过这些步骤,您可以在Linux环境中高效地部署和优化Llama 3模型,以满足不同的性能和功能需求。