Llama3-8B-Chinese-Chat開源對話模型 - 優化中文回答，減少中英文混雜！

首頁

Llama3 8B Chinese Chat

由shenzhi-wang開發

基於Meta-Llama-3-8B-Instruct微調的中英文對話模型，優化了中文回答質量，減少中英文混雜問題。

大型語言模型

Transformers

支持多種語言#中英雙語對話 #角色扮演優化 #函數調用支持

下載量 2,777

發布時間 : 4/21/2024

模型概述

專為中英文用戶設計的指令調優語言模型，具備角色扮演、工具使用等多種能力。

模型特點

優化的中文回答質量

顯著減少'中文提問英文回答'及中英文混雜的問題

增強的訓練數據

v2.1版本訓練數據量擴大5倍（約10萬條偏好對）

多能力提升

在角色扮演、函數調用和數學能力方面表現顯著提升

多種量化版本

提供q4_0、q8_0和f16等多種GGUF量化版本

模型能力

中文對話

英文對話

角色扮演

函數調用

數學推理

使用案例

智能對話

中文問答

用中文回答用戶的各種問題

回答質量超越ChatGPT

角色扮演

模擬特定角色進行對話

表現顯著提升

教育輔助

數學問題解答

幫助解答數學問題

數學能力超越前代版本

🚀 Llama3-8B-Chinese-Chat

Llama3-8B-Chinese-Chat 是一款基於 Meta-Llama-3-8B-Instruct 模型，針對中英雙語用戶進行指令微調的語言模型。它具備角色扮演、工具使用等多種能力，能為用戶提供更優質的交互體驗。

🚀 快速開始

你可以按照以下步驟使用 Llama3-8B-Chinese-Chat 模型：

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shenzhi-wang/Llama3-8B-Chinese-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

messages = [
    {"role": "user", "content": "寫一首詩吧"},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

✨ 主要特性

多語言支持：專為中英雙語用戶設計，能有效減少“中文問題英文回答”以及回覆中中英文混雜的問題。
性能提升：相比 v1 版本，v2 版本顯著增加了訓練數據量（從 20K 增加到 100K），在角色扮演、工具使用和數學計算等方面表現更優。
多版本選擇：提供 v1、v2、v2.1 等多個版本，以及不同量化格式（如 f16 GGUF、8bit - quantized GGUF 等）供用戶選擇。

📦 安裝指南

若要復現 Llama3-8B-Chinese-Chat-v2 模型，可參考以下步驟：

git clone https://github.com/hiyouga/LLaMA-Factory.git
git reset --hard 32347901d4af94ccd72b3c7e1afaaceb5cb3d26a    # For Llama3-8B-Chinese-Chat-v1: 836ca0558698206bbf4e3b92533ad9f67c9f9864

cd LLaMA-Factory

# Our dataset used for Llama3-8B-Chinese-Chat-v2 will be released soon. If you want to reproduce Llama3-8B-Chinese-Chat-v1, you can set `Your_Dataset_Name_or_PATH=dpo_mix_en,dpo_mix_zh`.
deepspeed --num_gpus 8 src/train_bash.py \
    --deepspeed ${Your_Deepspeed_Config_Path} \
    --stage orpo \
    --do_train \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --dataset ${Your_Dataset_Name_or_PATH} \
    --template llama3 \
    --finetuning_type full \
    --output_dir ${Your_Output_Path} \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --lr_scheduler_type cosine \
    --log_level info \
    --logging_steps 5 \
    --save_strategy epoch \
    --save_total_limit 1 \
    --save_steps 100 \
    --learning_rate 3e-6 \
    --num_train_epochs 2.0 \
    --plot_loss \
    --do_eval false \
    --max_steps -1 \
    --bf16 true \
    --seed 42 \
    --warmup_ratio 0.1 \
    --cutoff_len 8192 \
    --flash_attn true \
    --orpo_beta 0.05 \
    --optim paged_adamw_32bit

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shenzhi-wang/Llama3-8B-Chinese-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype="auto", device_map="auto"
)

messages = [
    {"role": "user", "content": "寫一首詩吧"},
]

input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=8192,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

高級用法

在不同場景下，你可以根據需求調整模型的參數，如max_new_tokens、temperature、top_p等，以獲得更符合預期的回覆。

📚 詳細文檔

模型概述

Llama3-8B-Chinese-Chat 是一款基於 Meta-Llama-3-8B-Instruct 模型，針對中英雙語用戶進行微調的語言模型，具有多種能力，如角色扮演和工具使用等。

開發者信息

Shenzhi Wang*
Yaowei Zheng*
Guoyin Wang (in.ai)
Shiji Song
Gao Huang

(*: 同等貢獻)

模型信息

屬性	詳情
模型類型	指令微調語言模型
訓練數據	中英混合數據集
許可證	Llama-3 License
基礎模型	Meta-Llama-3-8B-Instruct
模型大小	8.03B
上下文長度	8K

訓練框架

LLaMA-Factory

訓練詳情

輪數：2
學習率：3e - 6
學習率調度器類型：cosine
預熱比例：0.1
截斷長度：8192
orpo 係數：0.05
全局批次大小：128
微調類型：全參數微調
優化器：paged_adamw_32bit

🔧 技術細節

本模型是首個基於 Meta-Llama-3-8B-Instruct 模型，通過 ORPO [1] 專門為中英用戶進行微調的模型。

與原始的 Meta-Llama-3-8B-Instruct 模型相比，我們的 Llama3-8B-Chinese-Chat-v1 模型顯著減少了“中文問題英文回答”以及回覆中中英文混雜的問題。
與 Llama3-8B-Chinese-Chat-v1 相比，我們的 Llama3-8B-Chinese-Chat-v2 模型顯著增加了訓練數據量（從 20K 增加到 100K），在角色扮演、工具使用和數學計算等方面性能有大幅提升。

[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).

📄 許可證

本模型遵循 Llama-3 License。

📖 引用

如果我們的 Llama3-8B-Chinese-Chat 對你有幫助，請按以下格式引用：

@misc {shenzhi_wang_2024,
	author       = {Wang, Shenzhi and Zheng, Yaowei and Wang, Guoyin and Song, Shiji and Huang, Gao},
	title        = { Llama3-8B-Chinese-Chat (Revision 6622a23) },
	year         = 2024,
	url          = { https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat },
	doi          = { 10.57967/hf/2316 },
	publisher    = { Hugging Face }
}

⚠️ 重要提示

主分支包含 Llama3-8B-Chinese-Chat-v2.1 的文件。如果你想使用 Llama3-8B-Chinese-Chat-v1，請參考 the v1 branch；如果你想使用 Llama3-8B-Chinese-Chat-v2，請參考 the v2 branch。

為了獲得最佳性能，我們沒有對模型的身份進行微調。因此，諸如“你是誰”或“誰開發了你”之類的詢問可能會產生隨機回覆，不一定準確。