OpenELM-450M開源語言模型 - 免費使用多參數版本提升信息處理精度

首頁

Openelm 450M

由apple開發

OpenELM是一組開放的高效語言模型，採用分層縮放策略優化參數分配，提升模型精度。提供2.7億至30億參數的預訓練及指令調優版本。

大型語言模型

Transformers

#分層參數分配 #高效語言模型 #多規模參數

下載量 857

發布時間 : 4/12/2024

模型概述

OpenELM系列模型專注於通過高效的參數分配策略提升語言模型性能，適用於多種自然語言處理任務。

模型特點

分層縮放策略

在Transformer模型的每一層中高效分配參數，優化模型性能

多規模選擇

提供從2.7億到30億參數的不同規模模型，適應不同計算需求

完整開源框架

包含數據準備、訓練、微調和評估的完整流程代碼

透明研究支持

提供多個預訓練檢查點和訓練日誌，促進開放研究

模型能力

文本生成

語言理解

指令跟隨

使用案例

自然語言處理

文本生成

利用預訓練模型生成連貫的文本內容

指令跟隨

使用指令調優模型執行特定任務

🚀 OpenELM

OpenELM 是一系列開放、高效的 語言模型。它採用逐層縮放策略，在Transformer模型的每一層中高效分配參數，從而提高了模型的準確性。該模型使用 CoreNet 庫進行預訓練，併發布了參數分別為2.7億、4.5億、11億和30億的預訓練模型和指令微調模型。同時，項目還發布了完整的框架，包括數據準備、訓練、微調、評估等流程，以及多個預訓練檢查點和訓練日誌，以促進開放研究。

🚀 快速開始

我們在 generate_openelm.py 中提供了一個示例函數，用於從通過 HuggingFace Hub 加載的OpenELM模型生成輸出。

你可以運行以下命令來嘗試該模型：

python generate_openelm.py --model apple/OpenELM-450M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

請參考此鏈接以獲取你的Hugging Face訪問令牌。

可以通過 generate_kwargs 傳遞額外的參數給Hugging Face的生成函數。例如，為了加速推理，你可以嘗試通過傳遞 prompt_lookup_num_tokens 參數來使用查找令牌推測生成：

python generate_openelm.py --model apple/OpenELM-450M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10

或者，通過 assistant_model 參數傳遞一個較小的模型，嘗試使用輔助模型進行模型級別的推測生成，例如：

python generate_openelm.py --model apple/OpenELM-450M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL]

✨ 主要特性

高效參數分配：採用逐層縮放策略，在Transformer模型的每一層中高效分配參數，提高了模型的準確性。
多參數模型發佈：發佈了參數分別為2.7億、4.5億、11億和30億的預訓練模型和指令微調模型。
完整框架開源：發佈了完整的框架，包括數據準備、訓練、微調、評估等流程，以及多個預訓練檢查點和訓練日誌，方便進行開放研究。

📦 安裝指南

評估環境安裝

安裝以下依賴項：

# 安裝公共的lm-eval-harness
harness_repo="public-lm-eval-harness"
git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
cd ${harness_repo}
# 使用2024年3月15日的主分支，SHA為dc90fec
git checkout dc90fec
pip install -e .
cd ..

# 66d6242是2024年4月1日的主分支
pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0

💻 使用示例

基礎用法

運行以下命令生成文本：

python generate_openelm.py --model apple/OpenELM-450M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

高級用法

使用查找令牌推測生成加速推理：

python generate_openelm.py --model apple/OpenELM-450M --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10

📚 詳細文檔

主要結果

零樣本學習（Zero-Shot）

模型大小	ARC-c	ARC-e	BoolQ	HellaSwag	PIQA	SciQ	WinoGrande	平均
OpenELM-270M	26.45	45.08	53.98	46.71	69.75	84.70	53.91	54.37
OpenELM-270M-Instruct	30.55	46.68	48.56	52.07	70.78	84.40	52.72	55.11
OpenELM-450M	27.56	48.06	55.78	53.97	72.31	87.20	58.01	57.56
OpenELM-450M-Instruct	30.38	50.00	60.37	59.34	72.63	88.00	58.96	59.95
OpenELM-1_1B	32.34	55.43	63.58	64.81	75.57	90.60	61.72	63.44
OpenELM-1_1B-Instruct	37.97	52.23	70.00	71.20	75.03	89.30	62.75	65.50
OpenELM-3B	35.58	59.89	67.40	72.44	78.24	92.70	65.51	67.39
OpenELM-3B-Instruct	39.42	61.74	68.17	76.36	79.00	92.50	66.85	69.15

LLM360

模型大小	ARC-c	HellaSwag	MMLU	TruthfulQA	WinoGrande	平均
OpenELM-270M	27.65	47.15	25.72	39.24	53.83	38.72
OpenELM-270M-Instruct	32.51	51.58	26.70	38.72	53.20	40.54
OpenELM-450M	30.20	53.86	26.01	40.18	57.22	41.50
OpenELM-450M-Instruct	33.53	59.31	25.41	40.48	58.33	43.41
OpenELM-1_1B	36.69	65.71	27.05	36.98	63.22	45.93
OpenELM-1_1B-Instruct	41.55	71.83	25.65	45.95	64.72	49.94
OpenELM-3B	42.24	73.28	26.76	34.98	67.25	48.90
OpenELM-3B-Instruct	47.70	76.87	24.80	38.76	67.96	51.22

OpenLLM排行榜

模型大小	ARC-c	CrowS-Pairs	HellaSwag	MMLU	PIQA	RACE	TruthfulQA	WinoGrande	平均
OpenELM-270M	27.65	66.79	47.15	25.72	69.75	30.91	39.24	53.83	45.13
OpenELM-270M-Instruct	32.51	66.01	51.58	26.70	70.78	33.78	38.72	53.20	46.66
OpenELM-450M	30.20	68.63	53.86	26.01	72.31	33.11	40.18	57.22	47.69
OpenELM-450M-Instruct	33.53	67.44	59.31	25.41	72.63	36.84	40.48	58.33	49.25
OpenELM-1_1B	36.69	71.74	65.71	27.05	75.57	36.46	36.98	63.22	51.68
OpenELM-1_1B-Instruct	41.55	71.02	71.83	25.65	75.03	39.43	45.95	64.72	54.40
OpenELM-3B	42.24	73.29	73.28	26.76	78.24	38.76	34.98	67.25	54.35
OpenELM-3B-Instruct	47.70	72.33	76.87	24.80	79.00	38.47	38.76	67.96	55.73

更多結果和比較請參閱技術報告。

評估

評估OpenELM

# OpenELM-450M
hf_model=apple/OpenELM-450M

# 由於lm-eval-harness默認將add_bos_token設置為False，但OpenELM使用的LLaMA分詞器需要add_bos_token為True，因此需要此標誌
tokenizer=meta-llama/Llama-2-7b-hf
add_bos_token=True
batch_size=1

mkdir lm_eval_output

shot=0
task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log

shot=5
task=mmlu,winogrande
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log

shot=25
task=arc_challenge,crows_pairs_english
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log

shot=10
task=hellaswag
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log

🔧 技術細節

預訓練數據集：包含RefinedWeb、去重後的PILE、RedPajama的一個子集和Dolma v1.6的一個子集，總計約1.8萬億個標記。使用這些數據集前，請查看其許可協議和條款。
參數分配策略：採用逐層縮放策略，在Transformer模型的每一層中高效分配參數，提高了模型的準確性。

📄 許可證

本項目採用 apple-sample-code-license 許可證。

🔗 引用

如果您覺得我們的工作有用，請引用：

@article{mehtaOpenELMEfficientLanguage2024,
	title = {{OpenELM}: {An} {Efficient} {Language} {Model} {Family} with {Open} {Training} and {Inference} {Framework}},
	shorttitle = {{OpenELM}},
	url = {https://arxiv.org/abs/2404.14619v1},
	language = {en},
	urldate = {2024-04-24},
	journal = {arXiv.org},
	author = {Mehta, Sachin and Sekhavat, Mohammad Hossein and Cao, Qingqing and Horton, Maxwell and Jin, Yanzi and Sun, Chenfan and Mirzadeh, Iman and Najibi, Mahyar and Belenko, Dmitry and Zatloukal, Peter and Rastegari, Mohammad},
	month = apr,
	year = {2024},
}

@inproceedings{mehta2022cvnets, 
     author = {Mehta, Sachin and Abdolhosseini, Farzad and Rastegari, Mohammad}, 
     title = {CVNets: High Performance Library for Computer Vision}, 
     year = {2022}, 
     booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, 
     series = {MM '22} 
}