Qwen2-7B-int4-inc開源模型 - 免費高效完成各類推理任務

首頁

Qwen2 7B Int4 Inc

由Intel開發

基於Qwen2-7B的INT4自動量化模型，由英特爾auto-round工具生成，適用於高效推理任務

大型語言模型

Transformers

開源協議:Apache-2.0 #4比特量化 #中文優化 #大語言模型

下載量 48

發布時間 : 6/5/2024

模型概述

本模型是Qwen2-7B大語言模型的4位整數量化版本，通過自動量化技術優化了推理效率，同時保持了較高的模型性能。

模型特點

高效INT4量化

採用4位整數量化技術，顯著減少模型大小和內存佔用，同時保持較高準確率

自動量化優化

使用auto-round工具自動優化量化過程，無需手動調整

多平臺支持

支持常規GPU/CPU和英特爾Gaudi-2加速器等多種硬件平臺

模型能力

中文文本生成

英文文本生成

數學問題解答

常識推理

知識問答

使用案例

內容生成

公司介紹生成

根據關鍵詞自動生成公司或產品介紹文本

示例：'阿里巴巴公司是全球領先的電子商務公司...'

故事創作

根據開頭提示續寫故事

示例：'Once upon a time, there was a little girl named Alice...'

教育輔助

數學問題解答

解答基礎數學比較和計算問題

示例：'9.8比9.11大0.7'

🚀 Intel/Qwen2-7B-int4-inc模型

本項目是一個基於Qwen/Qwen2-7B模型的量化模型，使用intel/auto-round工具生成int4自動舍入模型。該模型在多種任務上進行了評估，為用戶提供了高效的推理和評估方式。

🚀 快速開始

模型詳情

此模型是由 intel/auto-round 生成的 Qwen/Qwen2-7B 的 int4 自動舍入模型，分組大小為 128。如果需要 AutoGPTQ 格式，請使用版本 07a117c 加載模型。

安裝指南

推理環境：

##pip install auto-round (cpu needs version > 0.3.1))

評估環境：

pip3 install lm-eval==0.4.4,auto-round

💻 使用示例

基礎用法

INT4 推理

##pip install auto-round (cpu needs version > 0.3.1))
from auto_round import AutoRoundConfig ##must import for auto_round format
from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "Intel/Qwen2-7B-int4-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
model = AutoModelForCausalLM.from_pretrained(quantized_model_dir,
                                             device_map="auto"
                                             ## revision="07a117c" ##AutoGPTQ format
                                             )
text = "下面我來介紹一下阿里巴巴公司，"
text = "9.8和9.11哪個數字大？答案是"
text = "Once upon a time,"
text = "There is a girl who likes adventure,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50, do_sample=False)[0]))
##下面我來介紹一下阿里巴巴公司，阿里巴巴公司是全球領先的電子商務公司，成立於1999年，總部位於中國杭州。阿里巴巴公司致力於為全球中小企業提供一個在線交易平臺，幫助他們拓展業務，提高銷售額。阿里巴巴公司擁有多個業務板塊，包括淘寶、天貓
##
##9.8和9.11哪個數字大？答案是9.8，因為9.8比9.11大0.7。
##Once upon a time, there was a little girl named Alice who loved to read. She had a special book that she had inherited from her grandmother, and it was filled with stories of magical creatures and far-off lands. One day, Alice decided to read the book in a
##There is a girl who likes adventure, and she is always looking for new experiences. She is a bit of a thrill-seeker, and she loves to push herself to the limit. She is always up for a challenge, and she is not afraid to take risks. She is a bit

Intel Gaudi - 2 INT4 推理

建議使用帶有 Gaudi 軟件棧的 Docker 鏡像。更多詳情可查看 Gaudi 指南。

import habana_frameworks.torch.core as htcore
import habana_frameworks.torch.hpu as hthpu

from auto_round import AutoRoundConfig
from transformers import AutoModelForCausalLM,AutoTokenizer

quantized_model_dir = "Intel/Qwen2-7B-int4-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
model = AutoModelForCausalLM.from_pretrained(quantized_model_dir).to('hpu').to(bfloat16)
text = "下面我來介紹一下阿里巴巴公司,"
inputs = tokenizer(text, return_tensors="pt").to(model.device)
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=50, do_sample=False)[0]))

高級用法

評估模型

auto-round  --model "Intel/Qwen2-7B-int4-inc"  --eval --eval_bs 16  --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid

指標	BF16	INT4
平均值	0.6659	0.6604
mmlu	0.6697	0.6646
cmmlu	0.8254	0.8118
ceval - valid	0.8339	0.8053
lambada_openai	0.7182	0.7136
hellaswag	0.5823	0.5752
winogrande	0.7222	0.7277
piqa	0.7911	0.7933
truthfulqa_mc1	0.3647	0.3476
openbookqa	0.3520	0.3440
boolq	0.8183	0.8223
arc_easy	0.7660	0.7635
arc_challenge	0.4505	0.4633
gsm8k 5 shots(嚴格匹配)	0.7619	0.7528

生成模型

以下是重現該模型的示例命令。我們發現中文任務的準確率下降較大，建議使用高質量的中文數據集進行校準。然而，使用一些公共數據集並未獲得更好的準確率。

auto-round
--model_name  Qwen/Qwen2-7B \
--device 0 \
--group_size 128 \
--nsamples 512 \
--bits 4 \
--iter 1000 \
--disable_eval \
--model_dtype "float16" \
--format 'auto_round' \
--output_dir "./tmp_autoround"

📚 詳細文檔

倫理考量與侷限性

該模型可能會產生事實錯誤的輸出，因此不應依賴其生成事實準確的信息。由於預訓練模型和微調數據集的侷限性，此模型有可能生成低俗、有偏見或其他冒犯性的輸出。

因此，在部署該模型的任何應用之前，開發者應進行安全測試。

注意事項和建議

用戶（直接用戶和下游用戶）應瞭解該模型的風險、偏差和侷限性。

以下是一些瞭解英特爾 AI 軟件的有用鏈接：

英特爾神經壓縮器鏈接
英特爾 Transformers 擴展鏈接

免責聲明

此模型的許可證不構成法律建議。我們不對使用此模型的第三方的行為負責。如需將此模型用於商業目的，請諮詢律師。

引用

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github