🚀 Mixtral-8x22B模型卡片
Mixtral-8x22B大語言模型(LLM)是一個預訓練的生成式稀疏專家混合模型。Mistral AI最終將權重發布到了Mistral AI官方組織,同時提供了基礎模型和指令微調模型。你可以在以下鏈接獲取對應模型:
HuggingFace的工作人員將這個倉庫克隆到了一個新的官方倉庫mistral-community/Mixtral-8x22B-v0.1,如果你需要,也可以從這裡下載。感謝HuggingFace的工作人員!此外,這裡有一首很可愛的音樂哦!
該模型已使用這裡的腳本轉換為HuggingFace Transformers格式。
🚀 快速開始
運行模型
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
默認情況下,transformers庫會以全精度加載模型。因此,你可能會對我們在HuggingFace生態系統中提供的優化方法感興趣,這些方法可以進一步降低運行模型的內存需求:
半精度運行
注意,float16
精度僅適用於GPU設備。
點擊展開
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to(0)
text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用bitsandbytes
庫進行低精度(8位和4位)運行
點擊展開
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
使用Flash Attention 2加載模型
點擊展開
+ import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "v2ray/Mixtral-8x22B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
+ model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True)
text = "Hello my name is"
+ inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
⚠️ 注意事項
Mixtral-8x22B-v0.1是一個預訓練的基礎模型,因此沒有任何審核機制。
📄 許可證
本項目採用Apache-2.0許可證。
👥 Mistral AI團隊成員
Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Bam4d, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Jean-Malo Delignon, Jia Li, Justus Murke, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Nicolas Schuhl, Patrick von Platen, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibaut Lavril, Timothée Lacroix, Théophile Gervet, Thomas Wang, Valera Nemychnikova, William El Sayed, William Marshall.