from_pretrained(..., trust_remote_code=True) 的在线部分以 OpenELM-270M 为例(其他规模同理):
git lfs install
git clone https://huggingface.co/apple/OpenELM-270M-Instruct目录结构应类似:
OpenELM-270M-Instruct/
├── config.json
├── model.safetensors
├── tokenizer.json
├── tokenizer_config.json
├── special_tokens_map.json
├── generation_config.jsonmkdir openelm_offline
pip download \
torch \
transformers \
accelerate \
sentencepiece \
-d openelm_offlineopenelm_offline/ 到离线机器。pip install --no-index --find-links=openelm_offline *.whl或:
pip install --no-index --find-links=openelm_offline torch transformers accelerate sentencepiecefrom transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "./OpenELM-270M-Instruct" # 本地路径
tokenizer = AutoTokenizer.from_pretrained(
model_path,
local_files_only=True
)
model = AutoModelForCausalLM.from_pretrained(
model_path,
local_files_only=True,
torch_dtype="auto",
device_map="auto"
)
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))✅ 关键点
local_files_only=TrueCouldn't import sentencepiece✅ 解决:
pip install sentencepieceOSError: Can't load tokenizer✅ 检查是否包含:
tokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonmodel = AutoModelForCausalLM.from_pretrained(
model_path,
local_files_only=True,
torch_dtype="float32",
device_map=None
)或启用 CPU 推理:
device_map="cpu"pip download bitsandbytesfrom transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True
)
model = AutoModelForCausalLM.from_pretrained(
model_path,
local_files_only=True,
quantization_config=bnb_config,
device_map="auto"
)可用:
offline_openelm/
├── model/
│ └── OpenELM-270M-Instruct/
├── env/
├── run.pyOpenELM 离线配置 = 本地模型 + 本地依赖 + local_files_only=True如果你愿意,我可以帮你:
直接告诉我你的 系统 + 是否用 GPU 即可。