### Qwen-7B-Chat-GPTQ Open-Source Large Language Model - Supports Bilingual and Code Processing, Ultra-Practical for Multi-Round Conversations

Qwen 7B Chat GPTQ

Developed by openerotica

A 7-billion-parameter large language model developed by Alibaba Cloud, based on the Transformer architecture, supporting both Chinese and English languages as well as code processing, with multi-turn dialogue capabilities.

Large Language Model

Transformers

Supports Multiple Languages#Multi-turn Dialogue Optimization #Chinese-English Code Support #Long-text Understanding

Downloads 26

Release Time : 8/9/2023

Model Overview

Qwen-7B-Chat is an AI assistant developed from the Qwen-7B base model through alignment mechanisms, specifically optimized for multi-turn dialogue interaction scenarios.

Model Features

Efficient Tokenization

Optimized 150K token vocabulary, supporting efficient encoding for Chinese, English, and code, utilizing tiktoken tokenization technology.

Long-text Understanding

Supports 2048 tokens context length, achieving a score of 16.6 on the VCSUM Chinese long-text summarization task.

Multi-domain Capabilities

Balanced performance across programming, mathematics, social sciences, etc., with a Pass@1 rate of 24.4% on HumanEval coding tasks.

Tool Calling

Supports ReAct prompting and HuggingFace Agent, with tool selection accuracy as high as 99%.

Model Capabilities

Multi-turn Dialogue

Text Generation

Code Interpretation

Mathematical Reasoning

Tool Calling

Long-text Summarization

Use Cases

Intelligent Assistant

Multi-turn Dialogue System

Building a context-aware multi-turn dialogue AI assistant.

Average accuracy of 54.6% on the C-Eval Chinese test set.

Educational Applications

Programming Learning Assistance

Explaining code and generating programming examples.

24.4% pass rate on HumanEval code generation.

🚀 Qwen-7B-Chat

Qwen-7B-Chat is a large-model-based AI assistant developed by Alibaba Cloud. It is based on the Qwen-7B model and is trained with alignment techniques, offering excellent performance in various tasks such as text generation, dialogue interaction, and tool usage.

Qwen-7B 🤖 | 🤗 ｜ Qwen-7B-Chat 🤖 | 🤗 ｜ Demo ｜ Report

🚀 Quick Start

Introduction

Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, we release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. This repository is the one for Qwen-7B-Chat.

If you want to learn more details about the open-source model of Qwen-7B, we recommend you refer to the Github code repository.

Requirements

Python 3.8 and above
PyTorch 1.12 and above, 2.0 and above are recommended
CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)

Dependency

To run Qwen-7B-Chat, please make sure you meet the above requirements, and then execute the following pip commands to install the dependent libraries.

pip install transformers==4.31.0 accelerate tiktoken einops

In addition, it is recommended to install the flash-attention library for higher efficiency and lower memory usage.

git clone -b v1.0.8 https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# Below are optional. Installing them might be slow.
pip install csrc/layer_norm
pip install csrc/rotary

Usage Examples

Basic Usage

We show an example of multi-turn interaction with Qwen-7B-Chat in the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()

# Specify hyperparameters for generation
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True) # 可指定不同的生成长度、top_p等相关超参

# 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好！很高兴为你提供帮助。

# 2nd dialogue turn
response, history = model.chat(tokenizer, "给我讲一个年轻人奋斗创业最终取得成功的故事。", history=history)
print(response)
# 这是一个关于一个年轻人奋斗创业最终取得成功的故事。
# 故事的主人公叫李明，他来自一个普通的家庭，父母都是普通的工人。从小，李明就立下了一个目标：要成为一名成功的企业家。
# 为了实现这个目标，李明勤奋学习，考上了大学。在大学期间，他积极参加各种创业比赛，获得了不少奖项。他还利用课余时间去实习，积累了宝贵的经验。
# 毕业后，李明决定开始自己的创业之路。他开始寻找投资机会，但多次都被拒绝了。然而，他并没有放弃。他继续努力，不断改进自己的创业计划，并寻找新的投资机会。
# 最终，李明成功地获得了一笔投资，开始了自己的创业之路。他成立了一家科技公司，专注于开发新型软件。在他的领导下，公司迅速发展起来，成为了一家成功的科技企业。
# 李明的成功并不是偶然的。他勤奋、坚韧、勇于冒险，不断学习和改进自己。他的成功也证明了，只要努力奋斗，任何人都有可能取得成功。

# 3rd dialogue turn
response, history = model.chat(tokenizer, "给这个故事起一个标题", history=history)
print(response)
# 《奋斗创业：一个年轻人的成功之路》

For more information, please refer to our Github repo for more information.

📚 Documentation

Tokenizer

Note: As there is no consensus on the Chinese equivalent for the term "tokenization", this document uses the English expression for clarity.

Our tokenizer based on tiktoken is different from other tokenizers, e.g., sentencepiece tokenizer. You need to pay attention to special tokens, especially in finetuning. For more detailed information on the tokenizer and related use in fine-tuning, please refer to the documentation.

Model

The details of the model architecture of Qwen-7B-Chat are listed as follows

Property	Details
n_layers	32
n_heads	32
d_model	4096
Vocab Size	151851
Sequence Length	2048

For position encoding, FFN activation function, and normalization calculation methods, we adopt the prevalent practices, i.e., RoPE relative position encoding, SwiGLU for activation function, and RMSNorm for normalization (optional installation of flash-attention for acceleration).

For tokenization, compared to the current mainstream open-source models based on Chinese and English vocabularies, Qwen-7B-Chat uses a vocabulary of over 150K tokens. It first considers efficient encoding of Chinese, English, and code data, and is also more friendly to multilingual languages, enabling users to directly enhance the capability of some languages without expanding the vocabulary. It segments numbers by single digit, and calls the tiktoken tokenizer library for efficient tokenization.

Evaluation

For Qwen-7B-Chat, we also evaluate the model on C-Eval, MMLU, HumanEval, GSM8K, etc., as well as the benchmark evaluation for long-context understanding, and tool usage.

Note: Due to rounding errors caused by hardware and framework, differences in reproduced results are possible.

Chinese Evaluation

C-Eval

We demonstrate the zero-shot accuracy of Qwen-7B-Chat on C-Eval validation set

Model	Avg. Acc.
LLaMA2-7B-Chat	31.9
LLaMA2-13B-Chat	40.6
Chinese-Alpaca-2-7B	41.3
Chinese-Alpaca-Plus-13B	43.3
Baichuan-13B-Chat	50.4
ChatGLM2-6B-Chat	50.7
InternLM-7B-Chat	53.2
Qwen-7B-Chat	54.2

The zero-shot accuracy of Qwen-7B-Chat on C-Eval testing set is provided below:

Model	Avg.	STEM	Social Sciences	Humanities	Others
Chinese-Alpaca-Plus-13B	41.5	36.6	49.7	43.1	41.2
Chinese-Alpaca-2-7B	40.3	-	-	-	-
ChatGLM2-6B-Chat	50.1	46.4	60.4	50.6	46.9
Baichuan-13B-Chat	51.5	43.7	64.6	56.2	49.2
Qwen-7B-Chat	54.6	47.8	67.6	59.3	50.6

Compared with other pretrained models with comparable model size, the human-aligned Qwen-7B-Chat performs well in C-Eval accuracy.

English Evaluation

MMLU

The zero-shot accuracy of Qwen-7B-Chat on MMLU is provided below. The performance of Qwen-7B-Chat still on the top between other human-aligned models with comparable size.

Model	Avg. Acc.
ChatGLM2-6B-Chat	45.5
LLaMA2-7B-Chat	47.0
InternLM-7B-Chat	50.8
Baichuan-13B-Chat	52.1
ChatGLM2-12B-Chat	52.1
Qwen-7B-Chat	53.9

Coding Evaluation

The zero-shot Pass@1 of Qwen-7B-Chat on HumanEval is demonstrated below

Model	Pass@1
LLaMA2-7B-Chat	12.2
InternLM-7B-Chat	14.0
Baichuan-13B-Chat	16.5
LLaMA2-13B-Chat	18.9
Qwen-7B-Chat	24.4

Math Evaluation

The accuracy of Qwen-7B-Chat on GSM8K is shown below

Model	Zero-shot Acc.	4-shot Acc.
ChatGLM2-6B-Chat	-	28.0
LLaMA2-7B-Chat	20.4	28.2
LLaMA2-13B-Chat	29.4	36.7
InternLM-7B-Chat	32.6	34.5
Baichuan-13B-Chat	-	36.3
ChatGLM2-12B-Chat	-	38.1
Qwen-7B-Chat	41.1	43.5

Long-Context Understanding

We introduce NTK-aware interpolation, LogN attention scaling to extend the context length of Qwen-7B-Chat. The Rouge-L results of Qwen-7B-Chat on long-text summarization dataset VCSUM (The average length of this dataset is around 15K) are shown below:

(To use these tricks, please set use_dynamic_ntk and use_long_attn to true in config.json.)

Model	VCSUM (zh)
GPT-3.5-Turbo-16k	16.0
LLama2-7B-Chat	0.2
InternLM-7B-Chat	13.0
ChatGLM2-6B-Chat	16.3
Qwen-7B-Chat	16.6

Tool Usage

ReAct Prompting

Qwen-7B-Chat supports calling plugins/tools/APIs through ReAct Prompting. ReAct is also one of the main approaches used by the LangChain framework. In our evaluation benchmark for assessing tool usage capabilities, Qwen-7B-Chat's performance is as follows:

Model	Tool Selection (Acc.↑)	Tool Input (Rouge-L↑)	False Positive Error↓
GPT-4	95%	0.90	15%
GPT-3.5	85%	0.88	75%
Qwen-7B-Chat	99%	0.89	9.7%

The plugins that appear in the evaluation set do not appear in the training set of Qwen-7B-Chat. This benchmark evaluates the accuracy of the model in selecting the correct plugin from multiple candidate plugins, the rationality of the parameters passed into the plugin, and the false positive rate. False Positive: Incorrectly invoking a plugin when it should not have been called when responding to a query.

For how to write and use prompts for ReAct Prompting, please refer to the ReAct examples. The use of tools can enable the model to better perform tasks, as shown in the following figures:

Huggingface Agent

Qwen-7B-Chat also has the capability to be used as a HuggingFace Agent. Its performance on the run mode evaluation benchmark provided by Huggingface is as follows: (The original text seems incomplete here, but I'll keep it as it is.)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご