模型简介
模型特点
模型能力
使用案例
🚀 Mixtral-8x7B大语言模型
Mixtral-8x7B大语言模型(LLM)是一个预训练的生成式稀疏专家混合模型。在我们测试的大多数基准测试中,Mixtral-8x7B的表现优于Llama 2 70B。
🚀 快速开始
模型信息
属性 | 详情 |
---|---|
支持语言 | 法语、意大利语、德语、西班牙语、英语 |
许可证 | Apache-2.0 |
基础模型 | mistralai/Mixtral-8x7B-v0.1 |
⚠️ 重要提示
如果你想了解更多关于我们如何处理你的个人数据的信息,请阅读我们的隐私政策。
✨ 主要特性
Mixtral-8x7B是一个强大的大语言模型,在多个基准测试中表现出色。它可以通过不同的方式进行推理,包括使用mistral_inference
、Hugging Face的transformers
库等。同时,它还支持不同的精度设置,以满足不同的内存和性能需求。
📦 安装指南
文档未提及具体安装步骤,暂不提供。
💻 使用示例
基础用法
使用mistral-common
进行分词
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
mistral_models_path = "MISTRAL_MODELS_PATH"
tokenizer = MistralTokenizer.v1()
completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
使用mistral_inference
进行推理
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
model = Transformer.from_folder(mistral_models_path)
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)
使用Hugging Face的transformers
进行推理
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1")
model.to("cuda")
generated_ids = model.generate(tokens, max_new_tokens=1000, do_sample=True)
# decode with mistral tokenizer
result = tokenizer.decode(generated_ids[0].tolist())
print(result)
高级用法
运行模型
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
半精度推理
注意:float16
精度仅适用于GPU设备。
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用bitsandbytes
进行低精度(8位和4位)推理
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto")
text = "Hello my name is"
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用Flash Attention 2加载模型
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True, device_map="auto")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
📚 详细文档
指令格式
此格式必须严格遵守,否则模型将生成不理想的输出。 用于为指令模型构建提示的模板定义如下:
<s> [INST] 指令 [/INST] 模型回答</s> [INST] 后续指令 [/INST]
请注意,<s>
和</s>
是字符串开头(BOS)和字符串结尾(EOS)的特殊标记,而[INST]
和[/INST]
是常规字符串。
作为参考,以下是在微调期间用于对指令进行分词的伪代码:
def tokenize(text):
return tok.encode(text, add_special_tokens=False)
[BOS_ID] +
tokenize("[INST]") + tokenize(用户消息1) + tokenize("[/INST]") +
tokenize(模型回答1) + [EOS_ID] +
…
tokenize("[INST]") + tokenize(用户消息N) + tokenize("[/INST]") +
tokenize(模型回答N) + [EOS_ID]
在上述伪代码中,请注意tokenize
方法不应自动添加BOS或EOS标记,但应添加前缀空格。
在Transformers库中,可以使用聊天模板来确保应用正确的格式。
模型局限性
Mixtral-8x7B指令模型是一个快速演示,表明基础模型可以很容易地进行微调以实现出色的性能。它没有任何审核机制。我们期待与社区合作,探讨如何使模型更好地遵守规则,以便在需要审核输出的环境中进行部署。
🔧 技术细节
文档未提及具体技术细节,暂不提供。
📄 许可证
本项目采用Apache-2.0许可证。
团队信息
Mistral AI团队成员包括:Albert Jiang、Alexandre Sablayrolles、Arthur Mensch、Blanche Savary、Chris Bamford、Devendra Singh Chaplot、Diego de las Casas、Emma Bou Hanna、Florian Bressand、Gianna Lengyel、Guillaume Bour、Guillaume Lample、Lélio Renard Lavaud、Louis Ternon、Lucile Saulnier、Marie-Anne Lachaux、Pierre Stock、Teven Le Scao、Théophile Gervet、Thibaut Lavril、Thomas Wang、Timothée Lacroix、William El Sayed。
💡 使用建议
非常欢迎提交PR来修正transformers分词器,使其与
mistral-common
参考实现的结果完全一致!



