Llama3-8B-Chinese-Chat开源对话模型 - 优化中文回答，减少中英文混杂！

首页

Llama3 8B Chinese Chat

由 shenzhi-wang 开发

基于Meta-Llama-3-8B-Instruct微调的中英文对话模型，优化了中文回答质量，减少中英文混杂问题。

大型语言模型

Transformers

支持多种语言#中英双语对话 #角色扮演优化 #函数调用支持

下载量 2,777

发布时间 : 4/21/2024

模型简介

专为中英文用户设计的指令调优语言模型，具备角色扮演、工具使用等多种能力。

模型特点

优化的中文回答质量

显著减少'中文提问英文回答'及中英文混杂的问题

增强的训练数据

v2.1版本训练数据量扩大5倍（约10万条偏好对）

多能力提升

在角色扮演、函数调用和数学能力方面表现显著提升

多种量化版本

提供q4_0、q8_0和f16等多种GGUF量化版本

模型能力

中文对话

英文对话

角色扮演

函数调用

数学推理

使用案例

智能对话

中文问答

用中文回答用户的各种问题

回答质量超越ChatGPT

角色扮演

模拟特定角色进行对话

表现显著提升

教育辅助

数学问题解答

帮助解答数学问题

数学能力超越前代版本

🚀 Llama3-8B-Chinese-Chat

Llama3-8B-Chinese-Chat 是一款基于 Meta-Llama-3-8B-Instruct 模型，针对中英双语用户进行指令微调的语言模型。它具备角色扮演、工具使用等多种能力，能为用户提供更优质的交互体验。

🚀 快速开始

你可以按照以下步骤使用 Llama3-8B-Chinese-Chat 模型：

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shenzhi-wang/Llama3-8B-Chinese-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

messages = [
    {"role": "user", "content": "写一首诗吧"},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

✨ 主要特性

多语言支持：专为中英双语用户设计，能有效减少“中文问题英文回答”以及回复中中英文混杂的问题。
性能提升：相比 v1 版本，v2 版本显著增加了训练数据量（从 20K 增加到 100K），在角色扮演、工具使用和数学计算等方面表现更优。
多版本选择：提供 v1、v2、v2.1 等多个版本，以及不同量化格式（如 f16 GGUF、8bit - quantized GGUF 等）供用户选择。

📦 安装指南

若要复现 Llama3-8B-Chinese-Chat-v2 模型，可参考以下步骤：

git clone https://github.com/hiyouga/LLaMA-Factory.git
git reset --hard 32347901d4af94ccd72b3c7e1afaaceb5cb3d26a    # For Llama3-8B-Chinese-Chat-v1: 836ca0558698206bbf4e3b92533ad9f67c9f9864

cd LLaMA-Factory

# Our dataset used for Llama3-8B-Chinese-Chat-v2 will be released soon. If you want to reproduce Llama3-8B-Chinese-Chat-v1, you can set `Your_Dataset_Name_or_PATH=dpo_mix_en,dpo_mix_zh`.
deepspeed --num_gpus 8 src/train_bash.py \
    --deepspeed ${Your_Deepspeed_Config_Path} \
    --stage orpo \
    --do_train \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --dataset ${Your_Dataset_Name_or_PATH} \
    --template llama3 \
    --finetuning_type full \
    --output_dir ${Your_Output_Path} \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --lr_scheduler_type cosine \
    --log_level info \
    --logging_steps 5 \
    --save_strategy epoch \
    --save_total_limit 1 \
    --save_steps 100 \
    --learning_rate 3e-6 \
    --num_train_epochs 2.0 \
    --plot_loss \
    --do_eval false \
    --max_steps -1 \
    --bf16 true \
    --seed 42 \
    --warmup_ratio 0.1 \
    --cutoff_len 8192 \
    --flash_attn true \
    --orpo_beta 0.05 \
    --optim paged_adamw_32bit

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shenzhi-wang/Llama3-8B-Chinese-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

messages = [
    {"role": "user", "content": "写一首诗吧"},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

高级用法

在不同场景下，你可以根据需求调整模型的参数，如max_new_tokens、temperature、top_p等，以获得更符合预期的回复。

📚 详细文档

模型概述

Llama3-8B-Chinese-Chat 是一款基于 Meta-Llama-3-8B-Instruct 模型，针对中英双语用户进行微调的语言模型，具有多种能力，如角色扮演和工具使用等。

开发者信息

Shenzhi Wang*
Yaowei Zheng*
Guoyin Wang (in.ai)
Shiji Song
Gao Huang

(*: 同等贡献)

模型信息

属性	详情
模型类型	指令微调语言模型
训练数据	中英混合数据集
许可证	Llama-3 License
基础模型	Meta-Llama-3-8B-Instruct
模型大小	8.03B
上下文长度	8K

训练框架

LLaMA-Factory

训练详情

轮数：2
学习率：3e - 6
学习率调度器类型：cosine
预热比例：0.1
截断长度：8192
orpo 系数：0.05
全局批次大小：128
微调类型：全参数微调
优化器：paged_adamw_32bit

🔧 技术细节

本模型是首个基于 Meta-Llama-3-8B-Instruct 模型，通过 ORPO [1] 专门为中英用户进行微调的模型。

与原始的 Meta-Llama-3-8B-Instruct 模型相比，我们的 Llama3-8B-Chinese-Chat-v1 模型显著减少了“中文问题英文回答”以及回复中中英文混杂的问题。
与 Llama3-8B-Chinese-Chat-v1 相比，我们的 Llama3-8B-Chinese-Chat-v2 模型显著增加了训练数据量（从 20K 增加到 100K），在角色扮演、工具使用和数学计算等方面性能有大幅提升。

[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).

📄 许可证

本模型遵循 Llama-3 License。

📖 引用

如果我们的 Llama3-8B-Chinese-Chat 对你有帮助，请按以下格式引用：

@misc {shenzhi_wang_2024,
	author       = {Wang, Shenzhi and Zheng, Yaowei and Wang, Guoyin and Song, Shiji and Huang, Gao},
	title        = { Llama3-8B-Chinese-Chat (Revision 6622a23) },
	year         = 2024,
	url          = { https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat },
	doi          = { 10.57967/hf/2316 },
	publisher    = { Hugging Face }
}

⚠️ 重要提示

主分支包含 Llama3-8B-Chinese-Chat-v2.1 的文件。如果你想使用 Llama3-8B-Chinese-Chat-v1，请参考 the v1 branch；如果你想使用 Llama3-8B-Chinese-Chat-v2，请参考 the v2 branch。

为了获得最佳性能，我们没有对模型的身份进行微调。因此，诸如“你是谁”或“谁开发了你”之类的询问可能会产生随机回复，不一定准确。