一 安装前准备
二 安装 Ollama 并拉起 Llama 3
curl -fsSL https://ollama.com/install.sh | shollama --versionollama pull llama3:8bollama run llama3:8bollama list三 作为系统服务运行并开放远程访问
sudo vim /etc/systemd/system/ollama.service[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin"
# 允许远程访问
Environment="OLLAMA_HOST=0.0.0.0:11434"
# 可选:跨域
Environment="OLLAMA_ORIGINS=*"
# 可选:自定义模型存放路径(示例)
Environment="OLLAMA_MODELS=/home/ollama/.ollama/models"
[Install]
WantedBy=default.targetsudo systemctl daemon-reloadsudo systemctl enable --now ollamacurl http://127.0.0.1:11434(应返回 “Ollama is running”)curl http://服务器IP:11434sudo lsof -i :11434 或 ss -ltnp | grep 11434。四 部署 Open WebUI 可视化界面(可选)
docker run -d \
-p 3000:8080 \
--gpus all \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main--gpus all。五 GPU 加速与 Docker 部署方案(可选)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart dockerdocker run -d \
--gpus=all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollamadocker run -d \
--device /dev/kfd --device /dev/dri \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama:rocm六 常见问题与离线部署
lsof -i :11434 或 ss -ltnp | grep 11434 查进程并释放,或更改 OLLAMA_HOST 端口。OLLAMA_HOST=0.0.0.0:11434 已设置,且云服务器安全组/防火墙放行 11434 端口。ollama pull llama3:8b;必要时更换镜像源或离线导入。docker save)。docker load),按相同方式启动容器;Ollama 模型需在有网环境预先拉取到本地缓存目录后再拷贝至离线机器相同路径。