在 Ubuntu 上快速安装 OpenELM 的高效路线
一 安装前准备
sudo apt update && sudo apt install -y git git-lfs python3 python3-venvpython3 -V、git lfs version二 方式一 Docker 快速部署(推荐)
1) 拉取基础镜像并启动容器(示例映射端口 7860,便于后续用 WebUI 或 Gradio 演示):
docker pull nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
mkdir -p ~/openelm && cd ~/openelm
docker run -it --gpus all \
-v $PWD:/workspace \
-p 7860:7860 \
--name openelm-deploy \
nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04 /bin/bash2) 在容器内创建虚拟环境并安装依赖:
apt update && apt install -y python3-venv
python3 -m venv venv && . venv/bin/activate
pip install --upgrade pip
pip install torch==2.1.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.36.2 tokenizers==0.15.2 sentencepiece==0.2.0 accelerate==0.25.03) 获取模型(二选一):
git clone https://gitcode.com/mirrors/apple/OpenELM-3B-Instruct.git
cd OpenELM-3B-Instructpip install huggingface-hub
huggingface-cli login # 输入你的 HF_TOKEN(需 read 权限)
git lfs install
git clone https://huggingface.co/apple/OpenELM-3B-Instruct4) 运行推理(容器内):
python - <<'PY'
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "apple/OpenELM-3B-Instruct"
tok = AutoTokenizer.from_pretrained(model_name)
m = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
prompt = "Once upon a time there was"
inputs = tok(prompt, return_tensors="pt").to(m.device)
out = m.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
print(tok.decode(out[0], skip_special_tokens=True))
PY5) 如需 Web 演示,可在容器内安装 Gradio 并暴露 7860 端口运行。
三 方式二 本地虚拟环境部署(无 Docker)
1) 创建并激活虚拟环境:
python3 -m venv ~/venvs/openelm
source ~/venvs/openelm/bin/activate
pip install --upgrade pip2) 安装依赖(CPU 或 CUDA 11.8/12.x 均可,示例为 CUDA 12.1):
pip install torch==2.1.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.36.2 tokenizers==0.15.2 sentencepiece==0.2.0 accelerate==0.25.03) 获取模型(同上,二选一):
git clone https://gitcode.com/mirrors/apple/OpenELM-3B-Instruct.gitgit lfs install && git clone https://huggingface.co/apple/OpenELM-3B-Instruct4) 运行最小推理脚本(与 Docker 容器内一致):
python - <<'PY'
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "apple/OpenELM-3B-Instruct"
tok = AutoTokenizer.from_pretrained(model_name)
m = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
prompt = "Once upon a time there was"
inputs = tok(prompt, return_tensors="pt").to(m.device)
out = m.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
print(tok.decode(out[0], skip_special_tokens=True))
PY四 常见问题与优化