Mistral-7B-Instruct-v0.2开源大语言模型 - 30%稀疏化无需重训性能强

首页

Mistral 7B Instruct V0.2 Sparsity 30 V0.1

由 wang7776 开发

Mistral-7B-Instruct-v0.2是基于Mistral-7B-Instruct-v0.1增强的指令微调大语言模型，采用Wanda剪枝方法实现30%稀疏化，无需重新训练即可保持竞争力性能。

大型语言模型

Transformers

开源协议:Apache-2.0 #指令微调优化 #无重训练剪枝 #对话模板支持

下载量 75

发布时间 : 1/17/2024

模型简介

这是一个指令微调的大语言模型，专门优化了对话和指令跟随能力，适用于需要自然语言理解和生成的场景。

模型特点

Wanda剪枝技术

采用Wanda剪枝方法实现30%稀疏化，无需重新训练或权重更新即可保持竞争力性能

增强指令微调

相比v0.1版本进行了指令微调增强，优化了对话和指令跟随能力

高效注意力机制

采用分组查询注意力和滑动窗口注意力机制，提高计算效率

模型能力

自然语言理解

文本生成

对话系统

指令跟随

使用案例

对话系统

智能助手

构建能够理解并回应用户查询的智能对话助手

能够生成自然流畅的对话响应

内容生成

创意写作

生成故事、诗歌等创意文本内容

🚀 Mistral-7B-Instruct-v0.2模型介绍

本项目是基于Mistral-7B-Instruct-v0.2的模型，使用特定方法进行了优化，可用于文本生成任务。它在保持性能的同时，通过剪枝减少了模型参数。

📄 许可证

本模型采用Apache-2.0许可证。

🚀 快速开始

此模型使用 Wanda剪枝方法进行剪枝，稀疏度达到30%。该方法无需重新训练或更新权重，仍能取得有竞争力的性能。基础模型链接可点击此处查看。

Mistral-7B-Instruct-v0.2大语言模型（LLM）是 Mistral-7B-Instruct-v0.1 的改进版指令微调模型。

如需了解该模型的完整详情，请阅读我们的论文和发布博客文章。

✨ 主要特性

指令格式

为了利用指令微调，您的提示应使用 [INST] 和 [/INST] 标记包围。第一条指令应从句子起始ID开始，后续指令则不需要。助手的生成结果将以句子结束标记ID结束。

例如：

text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"

这种格式可通过 apply_chat_template() 方法作为聊天模板使用：

💻 使用示例

基础用法

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

🔧 技术细节

模型架构

此指令模型基于Mistral-7B-v0.1，这是一个具有以下架构选择的Transformer模型：

分组查询注意力（Grouped-Query Attention）
滑动窗口注意力（Sliding-Window Attention）
字节回退BPE分词器（Byte-fallback BPE tokenizer）

问题排查

如果您遇到以下错误：

Traceback (most recent call last):
File "", line 1, in
File "/transformers/models/auto/auto_factory.py", line 482, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/transformers/models/auto/configuration_auto.py", line 1022, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/transformers/models/auto/configuration_auto.py", line 723, in getitem
raise KeyError(key)
KeyError: 'mistral'

从源代码安装transformers库应该可以解决此问题：

pip install git+https://github.com/huggingface/transformers

在transformers-v4.33.4之后，应该不需要这样做。

局限性

Mistral 7B Instruct模型是一个快速演示，表明基础模型可以轻松进行微调以实现出色的性能。它没有任何审核机制。我们期待与社区合作，探索使模型更好地遵循规则的方法，以便在需要审核输出的环境中进行部署。

开发团队

Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.