Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Qwen-7B-Chat
Qwen-7B-Chat is a large-model-based AI assistant developed by Alibaba Cloud. It is based on the Qwen-7B model and is trained with alignment techniques, offering excellent performance in various tasks such as text generation, dialogue interaction, and tool usage.
Qwen-7B 🤖 | 🤗  | Qwen-7B-Chat 🤖 | 🤗  |  Demo  |  Report
🚀 Quick Start
Introduction
Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. This repository is the one for Qwen-7B-Chat.
If you want to learn more details about the open-source model of Qwen-7B, we recommend you refer to the Github code repository.
Requirements
- Python 3.8 and above
- PyTorch 1.12 and above, 2.0 and above are recommended
- CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)
Dependency
To run Qwen-7B-Chat, please make sure you meet the above requirements, and then execute the following pip commands to install the dependent libraries.
pip install transformers==4.31.0 accelerate tiktoken einops
In addition, it is recommended to install the flash-attention
library for higher efficiency and lower memory usage.
git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# Below are optional. Installing them might be slow.
pip install csrc/layer_norm
pip install csrc/rotary
Usage Examples
Basic Usage
We show an example of multi-turn interaction with Qwen-7B-Chat in the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参
# 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。
# 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明,他来自一个普通的家庭,父母都是普通的工人。从小,李明就立下了一个目标:要成为一名成功的企业家。
# 为了实现这个目标,李明勤奋学习,考上了大学。在大学期间,他积极参加各种创业比赛,获得了不少奖项。他还利用课余时间去实习,积累了宝贵的经验。
# 毕业后,李明决定开始自己的创业之路。他开始寻找投资机会,但多次都被拒绝了。然而,他并没有放弃。他继续努力,不断改进自己的创业计划,并寻找新的投资机会。
# 最终,李明成功地获得了一笔投资,开始了自己的创业之路。他成立了一家科技公司,专注于开发新型软件。在他的领导下,公司迅速发展起来,成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险,不断学习和改进自己。他的成功也证明了,只要努力奋斗,任何人都有可能取得成功。
# 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业:一个年轻人的成功之路》
For more information, please refer to our Github repo for more information.
📚 Documentation
Tokenizer
Note: As there is no consensus on the Chinese equivalent for the term "tokenization", this document uses the English expression for clarity.
Our tokenizer based on tiktoken is different from other tokenizers, e.g., sentencepiece tokenizer. You need to pay attention to special tokens, especially in finetuning. For more detailed information on the tokenizer and related use in fine-tuning, please refer to the documentation.
Model
The details of the model architecture of Qwen-7B-Chat are listed as follows
Property | Details |
---|---|
n_layers | 32 |
n_heads | 32 |
d_model | 4096 |
Vocab Size | 151851 |
Sequence Length | 2048 |
For position encoding, FFN activation function, and normalization calculation methods, we adopt the prevalent practices, i.e., RoPE relative position encoding, SwiGLU for activation function, and RMSNorm for normalization (optional installation of flash-attention for acceleration).
For tokenization, compared to the current mainstream open-source models based on Chinese and English vocabularies, Qwen-7B-Chat uses a vocabulary of over 150K tokens. It first considers efficient encoding of Chinese, English, and code data, and is also more friendly to multilingual languages, enabling users to directly enhance the capability of some languages without expanding the vocabulary. It segments numbers by single digit, and calls the tiktoken tokenizer library for efficient tokenization.
Evaluation
For Qwen-7B-Chat, we also evaluate the model on C-Eval, MMLU, HumanEval, GSM8K, etc., as well as the benchmark evaluation for long-context understanding, and tool usage.
Note: Due to rounding errors caused by hardware and framework, differences in reproduced results are possible.
Chinese Evaluation
C-Eval
We demonstrate the zero-shot accuracy of Qwen-7B-Chat on C-Eval validation set
Model | Avg. Acc. |
---|---|
LLaMA2-7B-Chat | 31.9 |
LLaMA2-13B-Chat | 40.6 |
Chinese-Alpaca-2-7B | 41.3 |
Chinese-Alpaca-Plus-13B | 43.3 |
Baichuan-13B-Chat | 50.4 |
ChatGLM2-6B-Chat | 50.7 |
InternLM-7B-Chat | 53.2 |
Qwen-7B-Chat | 54.2 |
The zero-shot accuracy of Qwen-7B-Chat on C-Eval testing set is provided below:
Model | Avg. | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|
Chinese-Alpaca-Plus-13B | 41.5 | 36.6 | 49.7 | 43.1 | 41.2 |
Chinese-Alpaca-2-7B | 40.3 | - | - | - | - |
ChatGLM2-6B-Chat | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
Baichuan-13B-Chat | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
Qwen-7B-Chat | 54.6 | 47.8 | 67.6 | 59.3 | 50.6 |
Compared with other pretrained models with comparable model size, the human-aligned Qwen-7B-Chat performs well in C-Eval accuracy.
English Evaluation
MMLU
The zero-shot accuracy of Qwen-7B-Chat on MMLU is provided below. The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.
Model | Avg. Acc. |
---|---|
ChatGLM2-6B-Chat | 45.5 |
LLaMA2-7B-Chat | 47.0 |
InternLM-7B-Chat | 50.8 |
Baichuan-13B-Chat | 52.1 |
ChatGLM2-12B-Chat | 52.1 |
Qwen-7B-Chat | 53.9 |
Coding Evaluation
The zero-shot Pass@1 of Qwen-7B-Chat on HumanEval is demonstrated below
Model | Pass@1 |
---|---|
LLaMA2-7B-Chat | 12.2 |
InternLM-7B-Chat | 14.0 |
Baichuan-13B-Chat | 16.5 |
LLaMA2-13B-Chat | 18.9 |
Qwen-7B-Chat | 24.4 |
Math Evaluation
The accuracy of Qwen-7B-Chat on GSM8K is shown below
Model | Zero-shot Acc. | 4-shot Acc. |
---|---|---|
ChatGLM2-6B-Chat | - | 28.0 |
LLaMA2-7B-Chat | 20.4 | 28.2 |
LLaMA2-13B-Chat | 29.4 | 36.7 |
InternLM-7B-Chat | 32.6 | 34.5 |
Baichuan-13B-Chat | - | 36.3 |
ChatGLM2-12B-Chat | - | 38.1 |
Qwen-7B-Chat | 41.1 | 43.5 |
Long-Context Understanding
We introduce NTK-aware interpolation, LogN attention scaling to extend the context length of Qwen-7B-Chat. The Rouge-L results of Qwen-7B-Chat on long-text summarization dataset VCSUM (The average length of this dataset is around 15K) are shown below:
(To use these tricks, please set use_dynamic_ntk
and use_long_attn
to true in config.json.)
Model | VCSUM (zh) |
---|---|
GPT-3.5-Turbo-16k | 16.0 |
LLama2-7B-Chat | 0.2 |
InternLM-7B-Chat | 13.0 |
ChatGLM2-6B-Chat | 16.3 |
Qwen-7B-Chat | 16.6 |
Tool Usage
ReAct Prompting
Qwen-7B-Chat supports calling plugins/tools/APIs through ReAct Prompting. ReAct is also one of the main approaches used by the LangChain framework. In our evaluation benchmark for assessing tool usage capabilities, Qwen-7B-Chat's performance is as follows:
Model | Tool Selection (Acc.↑) | Tool Input (Rouge-L↑) | False Positive Error↓ |
---|---|---|---|
GPT-4 | 95% | 0.90 | 15% |
GPT-3.5 | 85% | 0.88 | 75% |
Qwen-7B-Chat | 99% | 0.89 | 9.7% |
The plugins that appear in the evaluation set do not appear in the training set of Qwen-7B-Chat. This benchmark evaluates the accuracy of the model in selecting the correct plugin from multiple candidate plugins, the rationality of the parameters passed into the plugin, and the false positive rate. False Positive: Incorrectly invoking a plugin when it should not have been called when responding to a query.
For how to write and use prompts for ReAct Prompting, please refer to the ReAct examples. The use of tools can enable the model to better perform tasks, as shown in the following figures:
Huggingface Agent
Qwen-7B-Chat also has the capability to be used as a HuggingFace Agent. Its performance on the run mode evaluation benchmark provided by Huggingface is as follows: (The original text seems incomplete here, but I'll keep it as it is.)

