Baichuan-7B Open-Source Language Model - Effortlessly Handle Bilingual (Chinese-English) Conversations and Long Texts!

Baichuan 7B

Developed by baichuan-inc

Baichuan-7B is an open-source large-scale pre-trained language model developed by Baichuan Intelligence, based on the Transformer architecture with 7 billion parameters. Trained on bilingual Chinese-English corpus, it supports a context window of 4096 tokens.

Large Language Model

Transformers

Supports Multiple Languages#Bilingual Optimization (Chinese-English)#Long Text Comprehension #Open Source Commercial License

Downloads 20.47k

Release Time : 6/13/2023

Model Overview

A large-scale bilingual Chinese-English pre-trained language model that excels in authoritative benchmarks like C-EVAL/MMLU, supporting tasks such as text generation.

Model Features

Bilingual Optimization (Chinese-English)

Utilizes independently constructed bilingual training corpus, deeply optimized for Chinese scenarios, achieving top performance in C-Eval evaluations.

Permissive Open Source License

Adopts a more permissive open-source license compared to LLaMA (which prohibits commercial use), allowing commercial applications.

Long Context Support

Supports a 4096-token context window, suitable for long-text processing tasks.

Model Capabilities

Text Generation

Language Understanding

Question Answering Systems

Text Summarization

Use Cases

Education

Literary Work Analysis

Infer author information based on work titles

Correctly output 'Night Rain Sent North -> Li Shangyin' in sample tests

Evaluation Systems

Gaokao Question Answering

Answering Chinese Gaokao multiple-choice questions

Average score: 36.24

🚀 Baichuan-7B

Baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent Technology. It is based on the Transformer architecture and trained on approximately 1.2 trillion tokens. The model supports both Chinese and English, with a context window length of 4096, and achieves the best performance of its size on standard Chinese and English authoritative benchmarks.

🚀 Quick Start

If you wish to use Baichuan-7B (for inference, finetuning, etc.), we recommend using the accompanying code library Baichuan-7B.

✨ Features

Among models of the same size, Baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
Baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
Unlike LLaMA, which completely prohibits commercial use, Baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.

💻 Usage Examples

Basic Usage

The following is a task of performing 1-shot inference using Baichuan-7B, where the author's name is given based on the work, with the correct output being "夜雨寄北->李商隐"

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Advanced Usage

The following is a task of performing 1-shot inference using Baichuan-7B, where the author's name is given based on the work, with the correct output being "One Hundred Years of Solitude->Gabriel Garcia Marquez"

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('Hamlet->Shakespeare\nOne Hundred Years of Solitude->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

📚 Documentation

Model Description

Developed by: Baichuan Intelligent Technology
Email: opensource@baichuan-inc.com
Language(s) (NLP): Chinese/English
License: Baichuan-7B License

Model Sources

The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:

Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
Layer Normalization: Pre-Normalization based on RMSNorm.

The specific parameters are as follows:

Hyperparameter	Value
n_parameters	7000559616
n_layers	32
n_heads	32
d_model	4096
vocab size	64000
sequence length	4096

Uses

Downstream Use

We have also open-sourced the training code that accompanies this model, allowing for efficient finetuning for downstream tasks. For more details, please refer to Baichuan-7B.

Out-of-Scope Use

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

Bias, Risks, and Limitations

Baichuan-7B can produce factually incorrect output, and should not be relied on to produce factually accurate information. Baichuan-7B was trained on various public datasets. While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Training Details

For specific training settings, please refer to Baichuan-7B.

Evaluation

Chinese Evaluation

C-Eval

CEval dataset is a comprehensive Chinese foundation model evaluation dataset, covering 52 disciplines and four difficulty levels. We used the dev set of this dataset as the source for few-shot, and conducted a 5-shot test on the test set.

Model 5-shot	Average	Avg(Hard)	STEM	Social Sciences	Humanities	Others
GPT-4	68.7	54.9	67.1	77.6	64.5	67.8
ChatGPT	54.4	41.4	52.9	61.8	50.9	53.6
Claude-v1.3	54.2	39.0	51.9	61.7	52.1	53.7
Claude-instant-v1.0	45.9	35.5	43.1	53.8	44.2	45.4
moss-moon-003-base (16B)	27.4	24.5	27.0	29.1	27.2	26.9
Ziya-LLaMA-13B-pretrain	30.2	22.7	27.7	34.4	32.0	28.9
LLaMA-7B-hf	27.1	25.9	27.1	26.8	27.9	26.3
ChatGLM-6B	34.5	23.1	30.4	39.6	37.4	34.5
Falcon-7B	25.8	24.3	25.8	26.0	25.8	25.6
Open-LLaMA-v2-pretrain (7B)	24.0	22.5	23.1	25.3	25.2	23.2
TigerBot-7B-base	25.7	27.0	27.3	24.7	23.4	26.1
Aquila-7B^*	25.5	25.2	25.6	24.6	25.2	26.6
BLOOM-7B	22.8	20.2	21.8	23.3	23.9	23.3
BLOOMZ-7B	35.7	25.8	31.3	43.5	36.6	35.6
Baichuan-7B	42.8	31.5	38.2	52.0	46.2	39.3

Gaokao

Gaokao is a dataset that uses Chinese college entrance examination questions to evaluate the capabilities of large language models, aiming to assess the language and logical reasoning abilities of the models. We only retained the single-choice questions and conducted a unified 5-shot test on all models.

The following are the test results:

Model	Average
Open-LLaMA-v2-pretrain	21.41
Ziya-LLaMA-13B-pretrain	23.17
Falcon-7B	23.98
TigerBot-7B-base	25.94
LLaMA-7B	27.81
ChatGLM-6B	21.41
BLOOM-7B	26.96
BLOOMZ-7B	28.72
Aquila-7B^*	24.39
Baichuan-7B	36.24

AGIEval

AGIEval aims to evaluate the general capabilities of models in cognitive and problem-solving related tasks. We only retained the four-choice single-choice questions, randomly divided them, and conducted a unified 5-shot test on all models.

Model	Average
Open-LLaMA-v2-pretrain	23.49
Ziya-LLaMA-13B-pretrain	27.64
Falcon-7B	27.18
TigerBot-7B-base	25.19
LLaMA-7B	28.17
ChatGLM-6B	23.49
BLOOM-7B	26.55
BLOOMZ-7B	30.27
Aquila-7B^*	25.58
Baichuan-7B	34.44

^* The Aquila model is sourced from ZhiYuan official website, for reference only.

English Leaderboard

In addition to Chinese, we also tested the model's performance in English.

MMLU

MMLU is an English evaluation dataset that includes 57 multiple-choice tasks, covering elementary mathematics, American history, computer science, law, etc. The difficulty ranges from high school level to expert level, making it a mainstream LLM evaluation dataset.

We adopted the open-source evaluation scheme, and the final 5-shot results are as follows:

Model	Humanities	Social Sciences	STEM	Other	Average
LLaMA-7B²	34.0	38.3	30.5	38.1	35.1
Falcon-7B¹	-	-	-	-	35.0
mpt-7B¹	-	-	-	-	35.6
ChatGLM-6B⁰	35.4	41.0	31.3	40.5	36.9
BLOOM 7B⁰	25.0	24.4	26.5	26.4	25.5
BLOOMZ 7B⁰	31.3	42.1	34.4	39.0	36.1
moss-moon-003-base (16B)⁰	24.2	22.8	22.4	24.4	23.6
moss-moon-003-sft (16B)⁰	30.5	33.8	29.3	34.4	31.9
Baichuan-7B⁰	38.4	48.9	35.6	48.1	42.3

The superscript in the Model column indicates the source of the results.

0: reimplemented
1: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
2: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu

Our Group

WeChat

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご