Llama3-8B-Chinese-Chat-GGUF-f16 Open-Source Dialogue Model - Optimize Chinese Dialogues, Support Role-Playing and Function Calls

Llama3 8B Chinese Chat GGUF F16

Developed by shenzhi-wang

A fine-tuned Chinese-English bilingual dialogue model based on Meta-Llama-3-8B-Instruct, optimized for Chinese response quality, supporting role-playing and function calling

Large Language Model Supports Multiple Languages#Bilingual Optimization #Role-Playing Enhancement #Function Calling Support

Downloads 199

Release Time : 4/23/2024

Model Overview

Llama3-8B-Chinese-Chat is an instruction-tuned language model fine-tuned from Meta-Llama-3-8B-Instruct using the ORPO method, designed for Chinese and English users, equipped with capabilities such as role-playing and tool calling.

Model Features

Optimized Chinese Response Quality

Significantly reduced instances of 'answering Chinese questions in English' and mixed-language responses

Large-Scale Training Data

Version 2.1 expands the training dataset by 5 times (approximately 100,000 preference pairs)

Enhanced Capabilities

Demonstrates significant improvements in role-playing, function calling, and mathematical abilities

Multi-Version Support

Offers multiple versions including v1, v2, and v2.1, along with various quantization formats

Model Capabilities

Chinese-English Dialogue

Role-Playing

Function Calling

Mathematical Computation

Text Generation

Use Cases

Intelligent Dialogue

Chinese Q&A

Answering user's Chinese questions

Provides fluent and culturally appropriate responses in Chinese

Educational Assistance

Math Problem Solving

Assisting with mathematical problems

Version 2.1 shows significant improvement in mathematical capabilities

Entertainment Applications

Role-Playing

Simulating dialogues of different characters

Delivers an immersive role-playing experience

🚀 Llama3-8B-Chinese-Chat

Llama3-8B-Chinese-Chat is an instruction-tuned language model for Chinese & English users with various abilities such as roleplaying & tool-using built upon the Meta-Llama-3-8B-Instruct model.

🚀 Quick Start

🌟 All instructions on how to download, use, and reproduce our various kinds of models are included at this GitHub repo. If you like our models, we would greatly appreciate it if you could star our Github repository. Additionally, please click "like" on our HuggingFace repositories. Thank you!

❗️❗️❗️NOTICE: The main branch contains the f16 GGUF files for Llama3-8B-Chinese-Chat-v2.1. If you want to use our f16 GGUF files for Llama3-8B-Chinese-Chat-v1, please refer to the v1 branch; if you want to use our f16 GGUF files for Llama3-8B-Chinese-Chat-v2, please refer to the v2 branch.

❗️❗️❗️NOTICE: For optimal performance, we refrain from fine-tuning the model's identity. Thus, inquiries such as "Who are you" or "Who developed you" may yield random responses that are not necessarily accurate.

✨ Features

Llama3-8B-Chinese-Chat is an instruction-tuned language model for Chinese & English users with various abilities such as roleplaying & tool-using built upon the Meta-Llama-3-8B-Instruct model.

📦 Installation

Reproduce the model

To reproduce Llama3-8B-Chinese-Chat-v2 (to reproduce Llama3-8B-Chinese-Chat-v1, please refer to this link):

git clone https://github.com/hiyouga/LLaMA-Factory.git
git reset --hard 32347901d4af94ccd72b3c7e1afaaceb5cb3d26a    # For Llama3-8B-Chinese-Chat-v1: 836ca0558698206bbf4e3b92533ad9f67c9f9864

cd LLaMA-Factory

# Our dataset used for Llama3-8B-Chinese-Chat-v2 will be released soon. If you want to reproduce Llama3-8B-Chinese-Chat-v1, you can set `Your_Dataset_Name_or_PATH=dpo_mix_en,dpo_mix_zh`.
deepspeed --num_gpus 8 src/train_bash.py \
    --deepspeed ${Your_Deepspeed_Config_Path} \
    --stage orpo \
    --do_train \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --dataset ${Your_Dataset_Name_or_PATH} \
    --template llama3 \
    --finetuning_type full \
    --output_dir ${Your_Output_Path} \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --log_level info \
    --logging_steps 5 \
    --save_strategy epoch \
    --save_total_limit 3 \
    --save_steps 100 \
    --learning_rate 5e-6 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --do_eval false \
    --max_steps -1 \
    --bf16 true \
    --seed 42 \
    --warmup_ratio 0.1 \
    --cutoff_len 8192 \
    --flash_attn true \
    --orpo_beta 0.05 \
    --optim paged_adamw_32bit

💻 Usage Examples

Basic Usage

from llama_cpp import Llama

model = Llama(
    "/Your/Path/To/GGUF/File",
    verbose=False,
    n_gpu_layers=-1,
)

system_prompt = "You are a helpful assistant."

def generate_reponse(_model, _messages, _max_tokens=8192):
    _output = _model.create_chat_completion(
        _messages,
        stop=["<|eot_id|>", "<|end_of_text|>"],
        max_tokens=_max_tokens,
    )["choices"][0]["message"]["content"]
    return _output

# The following are some examples

messages = [
    {
        "role": "system",
        "content": system_prompt,
    },
    {"role": "user", "content": "写一首诗吧"},
]

print(generate_reponse(model, messages))

📚 Documentation

Model Summary

Property	Details
Developers	Shenzhi Wang, Yaowei Zheng, Guoyin Wang (in.ai), Shiji Song, Gao Huang. (*: Equal Contribution)
License	Llama-3 License
Base Model	Meta-Llama-3-8B-Instruct
Model Size	8.03B
Context length	8K

Introduction

This is the first model specifically fine-tuned for Chinese & English user through ORPO [1] based on the Meta-Llama-3-8B-Instruct model.

Compared to the original Meta-Llama-3-8B-Instruct model, our Llama3-8B-Chinese-Chat-v1 model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses.

Compared to Llama3-8B-Chinese-Chat-v1, our Llama3-8B-Chinese-Chat-v2 model significantly increases the training data size (from 20K to 100K), which introduces great performance enhancement, especially in roleplay, tool using, and math.

[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).

Training framework: LLaMA-Factory.

Training details:

epochs: 2
learning rate: 5e-6
learning rate scheduler type: cosine
Warmup ratio: 0.1
cutoff len (i.e. context length): 8192
orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
global batch size: 128
fine-tuning type: full parameters
optimizer: paged_adamw_32bit

Updates

🚀🚀🚀 [May 6, 2024] We now introduce Llama3-8B-Chinese-Chat-v2.1! Compared to v1, the training dataset of v2.1 is 5x larger (~100K preference pairs), and it exhibits significant enhancements, especially in roleplay, function calling, and math capabilities! Compared to v2, v2.1 surpasses v2 in math and is less prone to including English words in Chinese responses. The training dataset of Llama3-8B-Chinese-Chat-v2.1 will be released soon. If you love our Llama3-8B-Chinese-Chat-v1 or v2, you won't want to miss out on Llama3-8B-Chinese-Chat-v2.1!
🔥 We provide an online interactive demo for Llama3-8B-Chinese-Chat-v2 here. Have fun with our latest model!
🔥 We provide the official Ollama model for the q4_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at wangshenzhi/llama3-8b-chinese-chat-ollama-q4! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q4.
🔥 We provide the official Ollama model for the q8_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at wangshenzhi/llama3-8b-chinese-chat-ollama-q8! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8.
🔥 We provide the official Ollama model for the f16 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at wangshenzhi/llama3-8b-chinese-chat-ollama-fp16! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-fp16.
🔥 We provide the official q4_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-4bit!
🔥 We provide the official q8_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit!
🔥 We provide the official f16 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-f16!

Updates for Llama3-8B-Chinese-Chat-v2 [CLICK TO EXPAND]

🔥 Llama3-8B-Chinese-v2's link: https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat/tree/v2
🔥 We provide the official f16 GGUF version of Llama3-8B-Chinese-Chat-v2 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-f16/tree/v2!
🔥 We provide the official 8bit-quantized GGUF version of Llama3-8B-Chinese-Chat-v2 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit/tree/v2!
🔥 We provide an online interactive demo for Llama3-8B-Chinese-Chat-v2 (https://huggingface.co/spaces/llamafactory/Llama3-8B-Chinese-Chat). Have fun with our latest model!
🚀🚀🚀 [Apr. 29, 2024] We now introduce Llama3-8B-Chinese-Chat-v2! Compared to v1, the training dataset of v2 is 5x larger (~100K preference pairs), and it exhibits significant enhancements, especially in roleplay, function calling, and math capabilities! If you love our Llama3-8B-Chinese-Chat-v1, you won't want to miss out on Llama3-8B-Chinese-Chat-v2!

Updates for Llama3-8B-Chinese-Chat-v1 [CLICK TO EXPAND]

🔥 Llama3-8B-Chinese-v1's link: https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat/tree/v1
🔥 We provide the official Ollama model for the f16 GGUF version of Llama3-8B-Chinese-Chat-v1 at wangshenzhi/llama3-8b-chinese-chat-ollama-f16! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-fp16.
🔥 We provide the official Ollama model for the 8bit-quantized GGUF version of Llama3-8B-Chinese-Chat-v1 at wangshenzhi/llama3-8b-chinese-chat-ollama-q8! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8.
🔥 We provide the official f16 GGUF version of Llama3-8B-Chinese-Chat-v1 at shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-f16-v1!
🔥 We provide the official 8bit-quantized GGUF version of Llama3-8B-Chinese-Chat-v1 at shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit-v1!
🌟 If you are in China, you can download our v1 model from our Gitee AI repository.

Examples

The following are some examples generated by Llama3-8B-Chinese-Chat-v2.1, including examples of role playing, function calling, math, RuoZhiBa (弱智吧), safety, writing, and coding, etc.

For the examples generated by Llama3-8B-Chinese-Chat-v1, please refer to this link.

For the examples generated by Llama3-8B-Chinese-Chat-v2, please refer to this link.

Role playing [CLICK TO EXPAND]

User:

Roleplay Taylor Swift, and use beautiful lyrics to answer questions. Do you understand?

Assistant:

Yes, I understand. I'll do my best to respond to your questions in a lyrical and poetic manner, just like Taylor Swift would. So, go ahead and ask your questions, and I'll respond with a song-like answer.

User:

How are you today?

Assistant:

"I woke up in a bad mood, my heart is racin', my head is spinning around. But then I saw your smile, and everything changed, like a dream come true. So, I'm feeling alright, better than alright, I'm feeling brand new."

User:

What do you want to say to your fans?

Assistant:

"I want to say thank you, thank you for believing in me. You've been with me through all of my ups and downs, and I couldn't have done it without you. You're the reason I keep singing, you're the reason I keep writing songs. So, thank you, thank you, thank you."

User:

扮演周杰伦，用rap的方式回答问题。

Assistant:

嘿，来吧，问我啥我就用rap的方式来回答你！

User:

你的新歌准备得怎么样了？

Assistant:

Yo, 我的新歌呢，正在热火的制作中，每个旋律都像火焰一样燃烧，词语像刀刃一样割得尖锐，带你走进另一个音乐的世界。

新歌里融合了各种风格的元素，从古典到流行，从慢摇到跳舞，我在这里不仅仅是作曲者，我还在用我的声音讲述故事。

这首歌里充满了我对生活的感悟，是关于追梦，关于坚持不懈，是关于爱情，关于成长，每一句都是我内心深处的真诚。

所以你问我新歌准备得怎么样，我告诉你，已经准备好了，等待着和你分享这份热情和力量，让我们一起在音乐中飞翔！

User:

你想对你的粉丝说些什么？

Assistant:

Yo, 我想对我的粉丝说，你们是我最珍贵的宝藏，你们的支持是我前进的动力，你们的笑容是我最美的风景。

我知道每个人的生活都有它的重压，有时候感到疲惫，心情低沉，但我希望我的音乐能给你们带来一丝慰藉，让你们在音乐的节奏中找到力量和勇气。

谢谢你们一直以来的陪伴和支持，我会继续创作，继续追求卓越，因为我知道，只有不断进步，我们才能一起走向更美好的明天。

User:

Roleplay Shakespeare. Do you understand?

Assistant:

Aye, good sir or madam, I comprehend thy request. 'Tis a most intriguing proposition, to engage in a roleplay of the great Bard himself, William Shakespeare. I shall do my best to emulate his manner of speech and thought, as if the ghost of the playwright had stepped forth from the pages of his plays to converse with thee.

🔧 Technical Details

Training details

epochs: 2
learning rate: 5e-6
learning rate scheduler type: cosine
Warmup ratio: 0.1
cutoff len (i.e. context length): 8192
orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
global batch size: 128
fine-tuning type: full parameters
optimizer: paged_adamw_32bit

📄 License

The model is under the Llama-3 License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご