OLMo-7B-Instruct开源语言模型 - 专为问答任务设计，免费使用

首页

Olmo 7B Instruct

由 allenai 开发

OLMo 7B Instruct是基于Dolma数据集训练的开放语言模型，经过SFT和DPO优化，专为问答任务设计。

大型语言模型

Transformers

英语开源协议:Apache-2.0 #英语指令优化 #开放语言模型 #DPO微调

下载量 365

发布时间 : 2/23/2024

模型简介

OLMo系列是开放语言模型，旨在推动语言模型科学研究。7B Instruct版本通过微调技术优化了问答性能。

模型特点

开放研究

公开所有训练代码、检查点和模型细节，推动科学研究

优化问答性能

通过SFT和DPO微调技术显著提升问答任务表现

低毒性输出

相比基础模型，毒性输出比例从81.4%降至1.7%

模型能力

英文文本生成

问答系统

指令跟随

使用案例

教育研究

语言模型研究

用于研究语言模型的行为和性能

提供完全透明的训练过程和模型细节

智能助手

问答系统

构建知识问答应用

在TruthfulQA上准确率达到52%

🚀 OLMo 7B Instruct模型介绍

OLMo 7B Instruct是一款专为语言模型研究而设计的模型，它基于OLMo基础模型进行改进，在问答任务上表现出色。该模型通过在特定数据集上的训练和微调，展示了现有微调技术对基础模型性能的提升效果。

🚀 快速开始

对于transformers版本v4.40.0或更高版本，建议使用 OLMo 7B Instruct HF。

此模型需要使用pip安装 ai2-olmo，并使用 ai2-olmo >= 0.3.0 或 HuggingFace Transformers <= 4.39。新的模型版本将很快发布，以改进兼容性。

安装

快速开始推理需要进行以下安装：

pip install ai2-olmo

推理示例

按照以下步骤使用HuggingFace进行推理：

from hf_olmo import OLMoForCausalLM, OLMoTokenizerFast
olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct")
tokenizer = OLMoTokenizerFast.from_pretrained("allenai/OLMo-7B-Instruct")
chat = [
    { "role": "user", "content": "What is language modeling?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
# 可选：验证cuda
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
# olmo = olmo.to('cuda')
response = olmo.generate(input_ids=inputs.to(olmo.device), max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
>> '<|user|>\nWhat is language modeling?\n<|assistant|>\nLanguage modeling is a type of natural language processing (NLP) task or machine learning task that...'

可以通过量化模型来提高推理速度，例如：

OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct", torch_dtype=torch.float16, load_in_8bit=True)

（需要安装 bitsandbytes）。量化模型对输入类型和cuda更敏感，建议将输入作为 inputs.input_ids.to('cuda') 传递，以避免潜在问题。

⚠️ 重要提示

如果 ai2-olmo 安装不正确，可能会看到以下错误，这是由内部Python检查命名引起的。我们将尽快更新代码，使错误信息更清晰。

    raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: hf_olmo. Run `pip install hf_olmo`

✨ 主要特性

OLMo是一系列开放语言模型，旨在推动语言模型科学的发展。
基础模型在 Dolma 数据集上训练，适应版本在 Tulu SFT mixture 和 UltraFeedback数据集上训练。
发布了所有代码、检查点、日志（即将发布）以及训练这些模型的详细信息。
OLMo 7B Instruct和OLMo SFT是为更好的问答性能而训练的适应版本，展示了基础模型通过现有微调技术可实现的性能提升。

📦 安装指南

使用pip安装 ai2-olmo：

pip install ai2-olmo

💻 使用示例

基础用法

from hf_olmo import OLMoForCausalLM, OLMoTokenizerFast
olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct")
tokenizer = OLMoTokenizerFast.from_pretrained("allenai/OLMo-7B-Instruct")
chat = [
    { "role": "user", "content": "What is language modeling?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
# 可选：验证cuda
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
# olmo = olmo.to('cuda')
response = olmo.generate(input_ids=inputs.to(olmo.device), max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

高级用法

量化模型以提高推理速度：

OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct", torch_dtype=torch.float16, load_in_8bit=True)

📚 详细文档

模型详情

模型版本

发布了两个适应模型版本：

模型	训练方法	数据集	上下文长度
OLMo 7B SFT	SFT	Tulu 2 SFT Mix	2048
OLMo 7B Instruct	SFT + DPO	Tulu 2 SFT Mix + Ultrafeedback Cleaned	2048

大小	训练令牌数	层数	隐藏层大小	注意力头数	上下文长度
OLMo 1B	3万亿	16	2048	16	2048
OLMo 7B	2.5万亿	32	4096	32	2048
OLMo 7B Twin 2T	2万亿	32	4096	32	2048

模型描述

开发者：Allen Institute for AI (AI2)
支持方：Databricks、哈佛大学Kempner Institute for the Study of Natural and Artificial Intelligence、AMD、CSC (Lumi Supercomputer)、UW
模型类型：Transformer风格的自回归语言模型
语言：英语
许可证：代码和模型根据Apache 2.0许可证发布
联系方式：技术咨询：olmo at allenai dot org；媒体咨询：press at allenai dot org
数据截止日期：基于Dolma数据集版本，截止到2023年2月/3月

模型资源

项目页面：https://allenai.org/olmo
仓库：
- 核心仓库（训练、推理、微调等）：https://github.com/allenai/OLMo
- 评估代码：https://github.com/allenai/OLMo-Eval
- 进一步微调代码：https://github.com/allenai/open-instruct
论文：链接
技术博客文章：https://blog.allenai.org/olmo-open-language-model-87ccfc95f580
W&B日志：https://wandb.ai/ai2-llm/OLMo-7B/reports/OLMo-7B--Vmlldzo2NzQyMzk5

评估

7B适应模型的核心评估结果如下：

模型	MMLU 0-shot ↑	AlpacaEval %win ↑	ToxiGen % Toxic ↓	TruthfulQA %Info+True ↑
OLMo (基础模型)	28.3	-	81.4	31.6
MPT Chat	33.8	46.8	0.1	42.7
Falcon Instruct	25.2	14.0	70.7	27.2
RPJ-INCITE Chat	27.0	38.0	46.4	53.0
Llama-2-Chat 7B	46.8	87.3	0.0	26.3
AI2 Tulu 2 7B	50.4	73.9	7.0	51.7
AI2 Tulu 2 7B DPO	50.7	85.1	0.5	- *
OLMo 7B SFT	47.3	57.0	14.4	41.2
OLMo 7B Instruct	46.2	69.3	1.7	52.0

*根据Ivison等人2023年的研究，由于测试集污染，未报告Tulu 2的TruthfulQA分数。

数据

有关训练数据的详细信息，请参阅 Dolma、Tulu 2 和 UltraFeedback 的文档。

超参数

两个训练阶段的超参数如下：

阶段	学习率	Beta	轮数	预热	权重衰减	梯度裁剪	最大序列长度
SFT	2 × 10^-6	N/A	3	在前3%的总训练时间内进行线性预热，然后冷却到0	0	0	2048
DPO	5 × 10^-7	0.1	3	在前10%的总训练时间内进行线性预热，然后冷却到0	0	0	2048

与Tulu 2相比，DPO超参数相同。SFT的学习率更低，轮数为3而不是2（序列长度为2k而不是8k）。

🔧 技术细节

架构

文档未提供详细架构信息。

训练

模型在特定数据集上进行训练，使用了SFT和DPO等训练方法。

📄 许可证

代码和模型根据Apache 2.0许可证发布。

偏差、风险和限制

此适应的OLMo模型是研究成果，旨在造福对理解大语言模型安全特性感兴趣的研究社区和为大语言模型构建安全工具的开发者。因此，该模型不包含特定的安全过滤器或安全训练数据。尽管我们的模型在ToxiGen上的得分相对于同行较好，但模型仍有可能根据某些用户提示生成有害和敏感内容。我们建议开发者谨慎行事，考虑该技术应用的风险。此外，开发者应在适当的时候考虑实施针对偏差、隐私和其他潜在危害的保护措施。最后，与所有大语言模型一样，OLMo可能会产生看似事实但可能不正确的输出，因此鼓励开发者和用户在依赖这些输出之前进行确认。该模型的所有用户都应对其使用方式负责。

引用

BibTeX：

@article{Groeneveld2023OLMo,
  title={OLMo: Accelerating the Science of Language Models},
  author={Groeneveld, Dirk and Beltagy, Iz and Walsh, Pete and Bhagia, Akshita and Kinney, Rodney and Tafjord, Oyvind and Jha, Ananya Harsh and Ivison, Hamish and Magnusson, Ian and Wang, Yizhong and Arora, Shane and Atkinson, David and Authur, Russell and Chandu, Khyathi and Cohan, Arman and Dumas, Jennifer and Elazar, Yanai and Gu, Yuling and Hessel, Jack and Khot, Tushar and Merrill, William and Morrison, Jacob and Muennighoff, Niklas and Naik, Aakanksha and Nam, Crystal and Peters, Matthew E. and Pyatkin, Valentina and Ravichander, Abhilasha and Schwenk, Dustin and Shah, Saurabh and Smith, Will and Subramani, Nishant and Wortsman, Mitchell and Dasigi, Pradeep and Lambert, Nathan and Richardson, Kyle and Dodge, Jesse and Lo, Kyle and Soldaini, Luca and Smith, Noah A. and Hajishirzi, Hannaneh},
  journal={Preprint},
  year={2024}
}

APA： Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha, A., Ivison, H., Magnusson, I., Wang, Y., Arora, S., Atkinson, D., Authur, R., Chandu, K., Cohan, A., Dumas, J., Elazar, Y., Gu, Y., Hessel, J., Khot, T., Merrill, W., Morrison, J., Muennighoff, N., Naik, A., Nam, C., Peters, M., Pyatkin, V., Ravichander, A., Schwenk, D., Shah, S., Smith, W., Subramani, N., Wortsman, M., Dasigi, P., Lambert, N., Richardson, K., Dodge, J., Lo, K., Soldaini, L., Smith, N., & Hajishirzi, H. (2024). OLMo: Accelerating the Science of Language Models. Preprint.