🚀 mistralai/Mistral-7B-Instruct-v0.3 AWQ
本項目是Mistral-7B-Instruct-v0.3模型的AWQ量化版本,能高效進行文本生成任務,在特定硬件和軟件環境下可快速推理。
🚀 快速開始
安裝必要的包
pip install --upgrade autoawq autoawq-kernels
Python代碼示例
基礎用法
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer
model_path = "solidrust/Mistral-7B-Instruct-v0.3-AWQ"
system_message = "You are Mistral-7B-Instruct-v0.3, incarnated as a powerful AI. You were created by mistralai."
model = AutoAWQForCausalLM.from_quantized(model_path,
fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
streamer = TextStreamer(tokenizer,
skip_prompt=True,
skip_special_tokens=True)
prompt_template = """\
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""
prompt = "You're standing on the surface of the Earth. "\
"You walk one mile south, one mile west and one mile north. "\
"You end up exactly where you started. Where are you?"
tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
return_tensors='pt').input_ids.cuda()
generation_output = model.generate(tokens,
streamer=streamer,
max_new_tokens=512)
✨ 主要特性
📦 安裝指南
安裝必要的包,使用以下命令:
pip install --upgrade autoawq autoawq-kernels
💻 使用示例
基礎用法
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer
model_path = "solidrust/Mistral-7B-Instruct-v0.3-AWQ"
system_message = "You are Mistral-7B-Instruct-v0.3, incarnated as a powerful AI. You were created by mistralai."
model = AutoAWQForCausalLM.from_quantized(model_path,
fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
streamer = TextStreamer(tokenizer,
skip_prompt=True,
skip_special_tokens=True)
prompt_template = """\
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""
prompt = "You're standing on the surface of the Earth. "\
"You walk one mile south, one mile west and one mile north. "\
"You end up exactly where you started. Where are you?"
tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
return_tensors='pt').input_ids.cuda()
generation_output = model.generate(tokens,
streamer=streamer,
max_new_tokens=512)
關於AWQ
AWQ是一種高效、準確且極快的低比特權重量化方法,目前支持4位量化。與GPTQ相比,在基於Transformer的推理中,它速度更快,並且在質量上與最常用的GPTQ設置相當或更好。
AWQ模型目前僅在Linux和Windows系統上支持,且僅支持NVIDIA GPU。macOS用戶請使用GGUF模型。
它得到以下工具的支持:
📚 詳細文檔
屬性 |
詳情 |
基礎模型 |
mistralai/Mistral-7B-Instruct-v0.3 |
推理 |
否 |
庫名稱 |
transformers |
許可證 |
apache-2.0 |
任務類型 |
文本生成 |
量化者 |
Suparious |
標籤 |
4-bit、AWQ、文本生成、autotrain_compatible、endpoints_compatible |
📄 許可證
本模型使用的許可證為apache-2.0。