🚀 Mixtral-8x22B模型卡片
Mixtral-8x22B大语言模型(LLM)是一个预训练的生成式稀疏专家混合模型。Mistral AI最终将权重发布到了Mistral AI官方组织,同时提供了基础模型和指令微调模型。你可以在以下链接获取对应模型:
HuggingFace的工作人员将这个仓库克隆到了一个新的官方仓库mistral-community/Mixtral-8x22B-v0.1,如果你需要,也可以从这里下载。感谢HuggingFace的工作人员!此外,这里有一首很可爱的音乐哦!
该模型已使用这里的脚本转换为HuggingFace Transformers格式。
🚀 快速开始
运行模型
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
默认情况下,transformers库会以全精度加载模型。因此,你可能会对我们在HuggingFace生态系统中提供的优化方法感兴趣,这些方法可以进一步降低运行模型的内存需求:
半精度运行
注意,float16
精度仅适用于GPU设备。
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(0)
text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用bitsandbytes
库进行低精度(8位和4位)运行
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用Flash Attention 2加载模型
点击展开
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True)
text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
⚠️ 注意事项
Mixtral-8x22B-v0.1是一个预训练的基础模型,因此没有任何审核机制。
📄 许可证
本项目采用Apache-2.0许可证。
👥 Mistral AI团队成员
Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall.