🚀 mistralai/Mistral-7B-Instruct-v0.3 AWQ
本项目是Mistral-7B-Instruct-v0.3模型的AWQ量化版本,能高效进行文本生成任务,在特定硬件和软件环境下可快速推理。
🚀 快速开始
安装必要的包
pip install --upgrade autoawq autoawq-kernels
Python代码示例
基础用法
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer
model_path = "solidrust/Mistral-7B-Instruct-v0.3-AWQ"
system_message = "You are Mistral-7B-Instruct-v0.3, incarnated as a powerful AI. You were created by mistralai."
model = AutoAWQForCausalLM.from_quantized(model_path,
fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
streamer = TextStreamer(tokenizer,
skip_prompt=True,
skip_special_tokens=True)
prompt_template = """\
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""
prompt = "You're standing on the surface of the Earth. "\
"You walk one mile south, one mile west and one mile north. "\
"You end up exactly where you started. Where are you?"
tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
return_tensors='pt').input_ids.cuda()
generation_output = model.generate(tokens,
streamer=streamer,
max_new_tokens=512)
✨ 主要特性
📦 安装指南
安装必要的包,使用以下命令:
pip install --upgrade autoawq autoawq-kernels
💻 使用示例
基础用法
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer
model_path = "solidrust/Mistral-7B-Instruct-v0.3-AWQ"
system_message = "You are Mistral-7B-Instruct-v0.3, incarnated as a powerful AI. You were created by mistralai."
model = AutoAWQForCausalLM.from_quantized(model_path,
fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True)
streamer = TextStreamer(tokenizer,
skip_prompt=True,
skip_special_tokens=True)
prompt_template = """\
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"""
prompt = "You're standing on the surface of the Earth. "\
"You walk one mile south, one mile west and one mile north. "\
"You end up exactly where you started. Where are you?"
tokens = tokenizer(prompt_template.format(system_message=system_message,prompt=prompt),
return_tensors='pt').input_ids.cuda()
generation_output = model.generate(tokens,
streamer=streamer,
max_new_tokens=512)
关于AWQ
AWQ是一种高效、准确且极快的低比特权重量化方法,目前支持4位量化。与GPTQ相比,在基于Transformer的推理中,它速度更快,并且在质量上与最常用的GPTQ设置相当或更好。
AWQ模型目前仅在Linux和Windows系统上支持,且仅支持NVIDIA GPU。macOS用户请使用GGUF模型。
它得到以下工具的支持:
📚 详细文档
属性 |
详情 |
基础模型 |
mistralai/Mistral-7B-Instruct-v0.3 |
推理 |
否 |
库名称 |
transformers |
许可证 |
apache-2.0 |
任务类型 |
文本生成 |
量化者 |
Suparious |
标签 |
4-bit、AWQ、文本生成、autotrain_compatible、endpoints_compatible |
📄 许可证
本模型使用的许可证为apache-2.0。