Qwen2-7B-int4-inc开源模型 - 免费高效完成各类推理任务

首页

Qwen2 7B Int4 Inc

由 Intel 开发

基于Qwen2-7B的INT4自动量化模型，由英特尔auto-round工具生成，适用于高效推理任务

大型语言模型

Transformers

开源协议:Apache-2.0 #4比特量化 #中文优化 #大语言模型

下载量 48

发布时间 : 6/5/2024

模型简介

本模型是Qwen2-7B大语言模型的4位整数量化版本，通过自动量化技术优化了推理效率，同时保持了较高的模型性能。

模型特点

高效INT4量化

采用4位整数量化技术，显著减少模型大小和内存占用，同时保持较高准确率

自动量化优化

使用auto-round工具自动优化量化过程，无需手动调整

多平台支持

支持常规GPU/CPU和英特尔Gaudi-2加速器等多种硬件平台

模型能力

中文文本生成

英文文本生成

数学问题解答

常识推理

知识问答

使用案例

内容生成

公司介绍生成

根据关键词自动生成公司或产品介绍文本

示例：'阿里巴巴公司是全球领先的电子商务公司...'

故事创作

根据开头提示续写故事

示例：'Once upon a time, there was a little girl named Alice...'

教育辅助

数学问题解答

解答基础数学比较和计算问题

示例：'9.8比9.11大0.7'

🚀 Intel/Qwen2-7B-int4-inc模型

本项目是一个基于Qwen/Qwen2-7B模型的量化模型，使用intel/auto-round工具生成int4自动舍入模型。该模型在多种任务上进行了评估，为用户提供了高效的推理和评估方式。

🚀 快速开始

模型详情

此模型是由 intel/auto-round 生成的 Qwen/Qwen2-7B 的 int4 自动舍入模型，分组大小为 128。如果需要 AutoGPTQ 格式，请使用版本 07a117c 加载模型。

安装指南

推理环境：

##pip install auto-round (cpu needs version > 0.3.1))

评估环境：

pip3 install lm-eval==0.4.4,auto-round

💻 使用示例

基础用法

INT4 推理

##pip install auto-round (cpu needs version > 0.3.1))
from auto_round import AutoRoundConfig ##must import for auto_round format
from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "Intel/Qwen2-7B-int4-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
model = AutoModelForCausalLM.from_pretrained(quantized_model_dir,
                                             device_map="auto"
                                             ## revision="07a117c" ##AutoGPTQ format
                                             )
text = "下面我来介绍一下阿里巴巴公司，"
text = "9.8和9.11哪个数字大？答案是"
text = "Once upon a time,"
text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50, do_sample=False)[0]))
##下面我来介绍一下阿里巴巴公司，阿里巴巴公司是全球领先的电子商务公司，成立于1999年，总部位于中国杭州。阿里巴巴公司致力于为全球中小企业提供一个在线交易平台，帮助他们拓展业务，提高销售额。阿里巴巴公司拥有多个业务板块，包括淘宝、天猫
##
##9.8和9.11哪个数字大？答案是9.8，因为9.8比9.11大0.7。
##Once upon a time, there was a little girl named Alice who loved to read. She had a special book that she had inherited from her grandmother, and it was filled with stories of magical creatures and far-off lands. One day, Alice decided to read the book in a
##There is a girl who likes adventure, and she is always looking for new experiences. She is a bit of a thrill-seeker, and she loves to push herself to the limit. She is always up for a challenge, and she is not afraid to take risks. She is a bit

Intel Gaudi - 2 INT4 推理

建议使用带有 Gaudi 软件栈的 Docker 镜像。更多详情可查看 Gaudi 指南。

import habana_frameworks.torch.core as htcore
import habana_frameworks.torch.hpu as hthpu

from auto_round import AutoRoundConfig
from transformers import AutoModelForCausalLM,AutoTokenizer

quantized_model_dir = "Intel/Qwen2-7B-int4-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
model = AutoModelForCausalLM.from_pretrained(quantized_model_dir).to('hpu').to(bfloat16)
text = "下面我来介绍一下阿里巴巴公司,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50, do_sample=False)[0]))

高级用法

评估模型

auto-round  --model "Intel/Qwen2-7B-int4-inc"  --eval --eval_bs 16  --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid

指标	BF16	INT4
平均值	0.6659	0.6604
mmlu	0.6697	0.6646
cmmlu	0.8254	0.8118
ceval - valid	0.8339	0.8053
lambada_openai	0.7182	0.7136
hellaswag	0.5823	0.5752
winogrande	0.7222	0.7277
piqa	0.7911	0.7933
truthfulqa_mc1	0.3647	0.3476
openbookqa	0.3520	0.3440
boolq	0.8183	0.8223
arc_easy	0.7660	0.7635
arc_challenge	0.4505	0.4633
gsm8k 5 shots(严格匹配)	0.7619	0.7528

生成模型

以下是重现该模型的示例命令。我们发现中文任务的准确率下降较大，建议使用高质量的中文数据集进行校准。然而，使用一些公共数据集并未获得更好的准确率。

auto-round
--model_name  Qwen/Qwen2-7B \
--device 0 \
--group_size 128 \
--nsamples 512 \
--bits 4 \
--iter 1000 \
--disable_eval \
--model_dtype "float16" \
--format 'auto_round' \
--output_dir "./tmp_autoround"

📚 详细文档

伦理考量与局限性

该模型可能会产生事实错误的输出，因此不应依赖其生成事实准确的信息。由于预训练模型和微调数据集的局限性，此模型有可能生成低俗、有偏见或其他冒犯性的输出。

因此，在部署该模型的任何应用之前，开发者应进行安全测试。

注意事项和建议

用户（直接用户和下游用户）应了解该模型的风险、偏差和局限性。

以下是一些了解英特尔 AI 软件的有用链接：

英特尔神经压缩器链接
英特尔 Transformers 扩展链接

免责声明

此模型的许可证不构成法律建议。我们不对使用此模型的第三方的行为负责。如需将此模型用于商业目的，请咨询律师。

引用

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github