Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Star Semantic Large Model - TeleChat
Star Semantic Large Model - TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., Ltd. It offers high - performance language processing capabilities, with its 7B and 12B models trained on large - scale high - quality Chinese and English corpora.
🚀 Quick Start
Latest News
- March 20, 2024: Open - sourced the 12B version chat model and its quantized versions.
- January 11, 2024: Open - sourced a 1T Chinese dataset.
- January 10, 2024: Open - sourced the 7B version chat model and its quantized versions.
Model Introduction
Star Semantic Large Model - TeleChat
- The Star Semantic Large Model TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., Ltd. The 7B model base is trained on 1.5 trillion Tokens of high - quality Chinese and English corpora, and the 12B model base is trained on 3 trillion Tokens of high - quality Chinese and English corpora.
- We have open - sourced the dialogue models TeleChat - 7B - bot and TeleChat - 12B - bot, along with their
huggingface
format weight files. In addition, we have also open - sourced the int8 and int4 quantized versions of the 7B and 12B models. - TeleChat - 12B - bot has been improved in terms of model structure, training data, and training methods. It has significantly outperformed TeleChat - 7B - bot on general Q&A, knowledge, code, and math leaderboards. In terms of model structure, we used small - scale models to experiment with various model structure combinations and selected the optimal one. Compared with the TeleChat - 7B - bot model, the TeleChat - 12B - bot model uses a decoupled structure for the word embedding layer and the output layer, separating the parameters of the word embedding layer and the output lm head layer, which helps enhance training stability and convergence. In terms of training data, we collected a large amount of Chinese and English data covering books, encyclopedias, news, government affairs, law, medicine, patents, papers, mathematics, code, and many other fields. By optimizing the data cleaning strategy, we have greatly improved the text cleanliness, view unbiasedness, content effectiveness, and format standardization of the data. In terms of training methods, we used scientific data ratio learning and curriculum learning. We used small - parameter models to fit data with various data ratios to obtain a priori estimates of the difficulty of each dataset. During the training process, we automatically evaluated the loss of the current model on all datasets and the generation effect on the evaluation set at regular intervals, and dynamically increased the weight of the more difficult - to - learn datasets to ensure that the model has a better fitting effect on each dataset.
Model Structure
We designed the TeleChat model using a standard Decoder - only
structure and made the following improvements in the model dimensions:
- Positional Encoding: We use the Rotary Embedding positional encoding method, which integrates relative position information dependence into self - attention and has good position extrapolation ability. Rotary Embedding can also work well with Flash - Attention v2, increasing the model training speed by about 20%.
- Activation Function: We use the SwiGLU activation function instead of the GELU activation function. To reduce the computational load, we set the
ffn_hidden_size
to be less than 4 times the hidden layer size in the original SwiGLU. - Layer Normalization: Pre - Normalization based on RMSNorm.
- Decoupling of Word Embedding Layer and Output Layer: We separated the parameters of the word embedding layer and the output lm head layer of the TeleChat - 12B - bot model, which helps enhance training stability and convergence.
layer_num | hidden_size | ffn_hidden_size | head_num | tie_word_embeddings | |
---|---|---|---|---|---|
7B | 30 | 4096 | 12288 | 32 | Yes |
12B | 38 | 5120 | 12288 | 32 | No |
The open - sourced TeleChat models support deepspeed fine - tuning, and we have open - sourced the training code based on deepspeed, which supports Zero parallel video memory optimization and integrates FlashAttention2. They also support multi - turn capabilities, and we have open - sourced the multi - turn data construction method and integrated a mask loss training method for multi - turn model training to better focus on multi - turn answers and improve the Q&A effect. The extrapolation ability has been improved, and we have open - sourced an 8K training version model that can be extrapolated to 96K using NTK - aware extrapolation and attention scaling extrapolation. The models have good long - text generation capabilities and perform well in long - text writing tasks such as work summaries, work plans, PPT outlines, essays, tender documents, emails, proposals, weekly reports, and JD writing.
The following table shows the released versions and download links:
Model Version | Download Link |
---|---|
7B - FP16 | [TeleChat - 7B - FP16](https://huggingface.co/Tele - AI/Telechat - 7B) |
7B - int8 | [TeleChat - 7B - int8](https://huggingface.co/Tele - AI/Telechat - 7B - int8) |
7B - int4 | [TeleChat - 7B - int4](https://huggingface.co/Tele - AI/Telechat - 7B - int4) |
12B - FP16 | [TeleChat - 12B - FP16](https://huggingface.co/Tele - AI/TeleChat - 12B) |
12B - int8 | [TeleChat - 12B - int8](https://huggingface.co/Tele - AI/TeleChat - 12B - int8) |
12B - int4 | [TeleChat - 12B - int4](https://huggingface.co/Tele - AI/TeleChat - 12B - int4) |
Data Open - Source
Data Introduction
TeleChat - PTD is a comprehensive large - scale Chinese dataset extracted from the pre - training corpus of the Telecom Star Large Model TeleChat. The data mainly comes from web pages, books, official media, etc. We used a combination of rules and models for filtering and removed similar data to extract high - quality data as much as possible.
The TeleChat - PTD dataset publicly releases approximately 270 million pieces of data, which are composed of pure Chinese text. The original size is about 1TB, and after compression, it is 480G, with a total of 189 files. Redundant information has been removed from the dataset.
Data Download
Huggingface download address: [TeleChat - PTD](https://huggingface.co/datasets/Tele - AI/TeleChat - PTD) China Telecom Cloud Disk download address: Data Download (Access code: pkg8)
Effect Evaluation
The TeleChat model also performs well in evaluation compared to models of the same scale. Our evaluation set covers datasets such as MMLU, C - Eval, GAOKAO, AGIEval, CMMLU, GSM8K, MATH, HumanEval, CHID, etc. The evaluation capabilities include natural language understanding, knowledge, mathematical calculation and reasoning, code generation, etc.
The following table shows the evaluation results:
Model | MMLU | C - Eval | CMMLU | AGIEval | GAOKAO | GSM8K | MATH | HumanEval | CSL | CHID | EPRSTMT | BBH | HellaSwag |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 - shot | 5 - shot | 5 - shot | zero - shot | zero - shot | 4 - shot | 4 - shot | zero - shot | zero - shot | zero - shot | zero - shot | 3 - shot | zero - shot | |
LLaMA2 - 7B - chat | 46.2 | 31.9 | 31.5 | 28.5 | 16.1 | 26.3 | 3.9 | 12.2 | 58.8 | 44.1 | 57.5 | 35.6 | 74.1 |
LLaMA2 - 13B - chat | 54.6 | 36.2 | 38.7 | 32.3 | 18.6 | 29.6 | 5.0 | 18.9 | 61.2 | 48.0 | 59.4 | 40.2 | 78.2 |
ChatGLM2 - 6B - chat | 45.9 | 52.6 | 49.3 | 39.0 | 46.4 | 28.8 | 6.5 | 11.0 | 61.2 | 57.9 | 71.2 | 32.7 | 57.0 |
ChatGLM3 - 6B - chat | 51.9 | 53.8 | 54 | 38.9 | 49.3 | 56.7 | 18.7 | 61 | 65.6 | 63.4 | 85 | 44.6 | 62.7 |
Baichuan2 - 7B - chat | 52.8 | 55.6 | 54.0 | 35.3 | 39.7 | 32.8 | 6 | 13.4 | 60 | 75.2 | 87.5 | 35.8 | 61.6 |
Baichuan2 - 13B - chat | 57 | 56.7 | 58.4 | 40 | 51.4 | 55.3 | 8.6 | 17.7 | 63.1 | 78.2 | 87.5 | 49.9 | 66.9 |
Qwen - 7B - chat | 56.6 | 59.3 | 59.5 | 41.3 | 63.3 | 52.5 | 10.3 | 26.2 | 63.1 | 72.3 | 88.8 | 46.9 | 59.9 |
Qwen - 14B - chat | 66.4 | 71.7 | 70.0 | 47.3 | 76.5 | 61.0 | 26.8 | 36.6 | 55.6 | 72.3 | 91.2 | 58.0 | 65.2 |
TeleChat - 7B - chat | 60.5 | 64.6 | 64.3 | 46.8 | 59 | 36.7 | 10.3 | 20.1 | 66.8 | 88.0 | 87.5 | 19.5 | 36.7 |
TeleChat - 12B - chat | 73.3 | 66.6 | 74.2 | 51.7 | 53.1 | 57.2 | 16.0 | 22.0 | 60.6 | 83.2 | 86.3 | 52.2 | 71.5 |
Note: CMMLU, AGIEval, GAOKAO, CSL, CHID, EPRSTMT are all evaluated based on the evaluation method provided by the [OpenCompass](https://github.com/open - compass/OpenCompass/) platform. For the comparison models, we also refer to the official reported results and the OpenCompass results. We used our own evaluation script to evaluate the MMLU and CEVAL leaderboards. The specific method can be found in the evaluation/
folder.
Model Inference
>>> import os
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
>>> os.environ["CUDA_VISIBLE_DEVICES"] = '0'
>>> tokenizer = AutoTokenizer.from_pretrained('../models/7B')
>>> model = AutoModelForCausalLM.from_pretrained('../models/7B', trust_remote_code=True, device_map="auto", torch_dtype=torch.float16)
>>> generate_config = GenerationConfig.from_pretrained('../models/7B')
>>> question="What's the difference between light soy sauce and dark soy sauce?"
>>> answer, history = model.chat(tokenizer = tokenizer, question=question, history=[], generation_config=generate_config, stream=False)
>>> print(answer)
Light soy sauce and dark soy sauce are two different types of soy sauce, and their differences are as follows:
1. Different raw materials: Light soy sauce is made from soybeans, wheat, and other grains; while dark soy sauce is made from fermented seasonings such as soybean paste and flour paste.
2. Different production processes: Light soy sauce is made by soaking soybeans in water and then going through processes such as steaming and fermentation; while dark soy sauce is made by adding a certain proportion of salt, sugar, monosodium glutamate, and other seasonings to light soy sauce and then fermenting it.
3. Different tastes and flavors: Light soy sauce has a salty and fresh taste and a relatively refreshing mouthfeel; while dark soy sauce has a special aroma and taste and a relatively heavier mouthfeel.
In general, light soy sauce and dark soy sauce are different types of soy sauce, and they differ in terms of raw materials, production processes, and tastes.
Statement, Agreement, Citation
Statement
We hereby declare that the TeleChat model and its derivative models should not be used for any activities that endanger national and social security or violate the law. At the same time, we also require users not to use the TeleChat model for Internet services without safety review and filing. We hope that all users abide by the above principles to ensure that technological development takes place in a legal and compliant environment.
We have done our best to ensure the compliance of the data used in the model training process. However, due to the complexity of the model and data, there may still be some unforeseeable problems. Therefore, we will not be responsible for any problems caused by the use of the open - sourced TeleChat model, including but not limited to data security issues, public opinion risks, or any risks and problems caused by the model being misled, misused, spread, or improperly exploited.
Agreement
The community needs to follow the TeleChat Model Community License Agreement when using the TeleChat model. The TeleChat model supports commercial use. If you plan to use the TeleChat model or its derivatives for commercial purposes, you need to submit the application materials required by the TeleChat Model Community License Agreement through the following contact email: tele_ai@chinatelecom.cn. After the review is approved, you will be granted a non - exclusive, global, non - transferable, non - sublicenseable, revocable commercial copyright license.
Citation
If you need to cite our work, please use the following reference:
@misc{wang2024telechat,
title={TeleChat Technical Report},
author={Zihan Wang and Xinzhang Liu and Shixuan Liu and Yitong Yao and Yuyao Huang and Zhongjiang He and Xuelong Li and Yongxiang Li and Zhonghao Che and Zhaoxi Zhang and Yan Wang and Xin Wang and Luwen Pu and Huihan Xu and Ruiyu Fang and Yu Zhao and Jie Zhang and Xiaomeng Huang and Zhilong Lu and Jiaxin Peng and Wenjun Zheng and Shiquan Wang and Bingkai Yang and Xuewei he and Zhuoru Jiang and Qiyi Xie and Yanhan Zhang and Zhongqiu Li and Lingling Shi and Weiwei Fu and Yin Zhang and Zilu Huang and Sishi Xiong and Yuxiang Zhang and Chao Wang and Shuangyong Song},
year={2024},
eprint={2401.03804},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
📄 License
This project is licensed under the Apache - 2.0 license.

