Mistral-7B-Instruct-v0.2開源大語言模型 - 30%稀疏化無需重訓性能強

首頁

Mistral 7B Instruct V0.2 Sparsity 30 V0.1

由wang7776開發

Mistral-7B-Instruct-v0.2是基於Mistral-7B-Instruct-v0.1增強的指令微調大語言模型，採用Wanda剪枝方法實現30%稀疏化，無需重新訓練即可保持競爭力性能。

大型語言模型

Transformers

開源協議:Apache-2.0 #指令微調優化 #無重訓練剪枝 #對話模板支持

下載量 75

發布時間 : 1/17/2024

模型概述

這是一個指令微調的大語言模型，專門優化了對話和指令跟隨能力，適用於需要自然語言理解和生成的場景。

模型特點

Wanda剪枝技術

採用Wanda剪枝方法實現30%稀疏化，無需重新訓練或權重更新即可保持競爭力性能

增強指令微調

相比v0.1版本進行了指令微調增強，優化了對話和指令跟隨能力

高效注意力機制

採用分組查詢注意力和滑動窗口注意力機制，提高計算效率

模型能力

自然語言理解

文本生成

對話系統

指令跟隨

使用案例

對話系統

智能助手

構建能夠理解並回應用戶查詢的智能對話助手

能夠生成自然流暢的對話響應

內容生成

創意寫作

生成故事、詩歌等創意文本內容

🚀 Mistral-7B-Instruct-v0.2模型介紹

本項目是基於Mistral-7B-Instruct-v0.2的模型，使用特定方法進行了優化，可用於文本生成任務。它在保持性能的同時，通過剪枝減少了模型參數。

📄 許可證

本模型採用Apache-2.0許可證。

🚀 快速開始

此模型使用 Wanda剪枝方法進行剪枝，稀疏度達到30%。該方法無需重新訓練或更新權重，仍能取得有競爭力的性能。基礎模型鏈接可點擊此處查看。

Mistral-7B-Instruct-v0.2大語言模型（LLM）是 Mistral-7B-Instruct-v0.1 的改進版指令微調模型。

如需瞭解該模型的完整詳情，請閱讀我們的論文和發佈博客文章。

✨ 主要特性

指令格式

為了利用指令微調，您的提示應使用 [INST] 和 [/INST] 標記包圍。第一條指令應從句子起始ID開始，後續指令則不需要。助手的生成結果將以句子結束標記ID結束。

例如：

text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"

這種格式可通過 apply_chat_template() 方法作為聊天模板使用：

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

🔧 技術細節

模型架構

此指令模型基於Mistral-7B-v0.1，這是一個具有以下架構選擇的Transformer模型：

分組查詢注意力（Grouped-Query Attention）
滑動窗口注意力（Sliding-Window Attention）
字節回退BPE分詞器（Byte-fallback BPE tokenizer）

問題排查

如果您遇到以下錯誤：

Traceback (most recent call last):
File "", line 1, in
File "/transformers/models/auto/auto_factory.py", line 482, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/transformers/models/auto/configuration_auto.py", line 1022, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/transformers/models/auto/configuration_auto.py", line 723, in getitem
raise KeyError(key)
KeyError: 'mistral'

從源代碼安裝transformers庫應該可以解決此問題：

pip install git+https://github.com/huggingface/transformers

在transformers-v4.33.4之後，應該不需要這樣做。

侷限性

Mistral 7B Instruct模型是一個快速演示，表明基礎模型可以輕鬆進行微調以實現出色的性能。它沒有任何審核機制。我們期待與社區合作，探索使模型更好地遵循規則的方法，以便在需要審核輸出的環境中進行部署。

開發團隊

Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Louis Ternon, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.