7B

Developed by CausalLM

A 7B-parameter causal language model compatible with Meta LLaMA 2 architecture, outperforming similar models under 33B in multiple evaluations

Large Language Model

Transformers

Supports Multiple Languages#Multilingual text generation #Quantization compatibility #Synthetic data training

Downloads 177

Release Time : 10/22/2023

Model Overview

A causal language model trained on Qwen and LLaMA2 weights, supporting Chinese and English text generation tasks, using the same model architecture as LLaMA2

Model Features

High performance

Outperforms similar 7B models on benchmarks like MMLU, CEval, and GSM8K, with some metrics surpassing models under 33B

Multilingual support

Supports both English and Chinese text generation

Full compatibility

Compatible with GGUF, GPTQ, and AWQ quantization formats, can be directly loaded using the transformers library

Synthetic data training

Trained on 1.3 billion tokens of carefully curated SFT datasets, all data has been manually or synthetically rewritten

Model Capabilities

Text generation

Multi-turn dialogue

Mathematical reasoning

Knowledge Q&A

Use Cases

Dialogue systems

Intelligent assistant

Building multi-turn dialogue systems based on chatml format

Education

Math problem solving

Solving GSM8K-level math problems

Zero-shot accuracy 59.2%

🚀 CausalLM 7B - Fully Compatible with Meta LLaMA 2

CausalLM 7B is a powerful model fully compatible with Meta LLaMA 2, offering high - performance text generation capabilities and showing excellent results in multiple evaluations.

📄 License

The model is licensed under the WTFPL license.

📦 Datasets

The model was trained using the following datasets:

JosephusCheung/GuanacoDataset
Open - Orca/OpenOrca
stingning/ultrachat
meta - math/MetaMathQA
liuhaotian/LLaVA - Instruct - 150K
jondurbin/airoboros - 3.1
WizardLM/WizardLM_evol_instruct_V2_196k
RyokoAI/ShareGPT52K
RyokoAI/Fandom23K
milashkaarshif/MoeGirlPedia_wikitext_raw_archive
wikipedia
wiki_lingua
fnlp/moss - 003 - sft - data
garage - bAInd/Open - Platypus
LDJnr/Puffin
openbmb/llava_zh
BAAI/COIG
TigerResearch/tigerbot - zhihu - zh - 10k
liwu/MNBVC
teknium/openhermes

🌐 Language

The model supports the following languages:

English
Chinese

🚀 Quick Start

Use the transformers library that does not require remote/external code to load the model. You can use AutoModelForCausalLM and AutoTokenizer (or manually specify LlamaForCausalLM to load the language model and GPT2Tokenizer to load the tokenizer). The model quantization is fully compatible with GGUF (llama.cpp), GPTQ, and AWQ.

✨ Features

Recent Updates

The [DPO - α Version](https://huggingface.co/CausalLM/7B - DPO - alpha) outperforms Zephyr - β in MT - Bench.

llama.cpp GGUF models

The GPT2Tokenizer was fixed by Kerfuffle on https://github.com/ggerganov/llama.cpp/pull/3743, and new models have been re - uploaded. Thanks to TheBloke for the GGUF quantized models: [https://huggingface.co/TheBloke/CausalLM - 7B - GGUF](https://huggingface.co/TheBloke/CausalLM - 7B - GGUF).

Training Details

The model was trained based on the weights of Qwen (and LLaMA2 weights were also used for some initial weight calculations). You may need to comply with the commercial use restrictions of these two models depending on the situation. The training process used the same model architecture as LLaMA2, the same attention calculation method as the original MHA LLaMA2 models, and no additional scaling was applied to the Rotary Positional Encoding (RoPE).

Dataset Curation

A manually curated SFT dataset of 1.3B tokens was used for training, leveraging open - source datasets from Hugging Face. Most sentences were manually or synthetically rewritten, and alternate language versions were generated using larger language models. Augmented text training was also conducted using carefully selected Wikipedia entries, featured Fandom entries, and filtered Moegirlpedia entries. To balance efficiency and quality, 100% of the training data was synthetic data, and no direct use of internet text or original texts from publicly available datasets was made for fine - tuning.

Model Distillation

The 7B version of the model is a distilled version of the 14B model, specifically designed for speculative sampling. Caution should be exercised when directly using the model, as it may produce hallucinations or unreliable outputs.

Safety Considerations

The model was trained on unfiltered internet data. Since we cannot vet all of it, there may be a substantial amount of objectionable content, pornography, violence, and offensive language that we are unable to remove. You will still need to check the model's safety and filter keywords in the output. Due to computational resource constraints, we are currently unable to implement RLHF for the model's ethics and safety, nor train on SFT samples that refuse to answer certain questions for restrictive fine - tuning.

Multimodal Potential

The model underwent some fine - tuning on the prompt format introduced in LLaVA1.5 that is unrelated to image attention calculation. Aligning the ViT Projection module with frozen LM under visual instructions would enable rapid implementation of effective multimodal capabilities.

💡 Usage Tip

PROMPT FORMAT

Use the [chatml](https://github.com/openai/openai - python/blob/main/chatml.md) format. The System Prompt must not be empty!

📊 Evaluation Results

MMLU

Category	Accuracy
STEM	56.83
Humanities	58.79
Other	70.04
Social	72.41
AVERAGE	63.82

The model outperforms / equals the best Mistral - 7B Chat - style fine - tunes, ChatGLM3 - 6B, and all other models under 33B.

CEval (Val)

Category	Accuracy
STEM	61.67
Social Science	81.94
Humanities	77.19
Other	68.35
Hard	48.03
AVERAGE	70.27

The model outperforms all current 7B models, including ChatGLM3 - 6B.

GSM8K

Zero - shot Accuracy: 0.5921152388172858 (Outperforms WizardMath - 7B and Qwen - 7B)

MT - Behch on DPO Version

Model	MT - Bench
GPT - 4	8.99
GPT - 3.5 - Turbo	7.94
Zephyr - 7b - β (Overfitting)	7.34
Zephyr - 7b - α	6.88
[CausalLM/14B - DPO - α](https://huggingface.co/CausalLM/14B - DPO - alpha)	7.618868
[CausalLM/7B - DPO - α](https://huggingface.co/CausalLM/7B - DPO - alpha)	7.038125

Image Reference

Image drawn by GPT - 4 DALL·E 3 TL;DR: Perhaps this 7B model, better than all existing models <= 33B, in most quantitative evaluations...

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご