The open-source model llama-3-youko-8b-instruct - Supports Japanese and English bilingual instructions and enables free intelligent interaction.

Llama 3 Youko 8b Instruct

Developed by rinna

A Japanese and English instruction tuning model based on Llama-3, integrating SFT, chat vector and DPO technologies

Supports Multiple Languages#Japanese instruction optimization #Multi-technology fusion tuning #Japanese-English bilingual support

Downloads 852

Release Time : 7/21/2024

Model Overview

This model is an instruction-tuned version of rinna/llama-3-youko-8b. It uses supervised fine-tuning (SFT), chat vectors and direct preference optimization (DPO) technologies to respond to instructions more accurately and supports Japanese and English.

Model Features

Multi-technology fusion tuning

Combining supervised fine-tuning (SFT), chat vectors and direct preference optimization (DPO) technologies to improve instruction following ability

Multi-language support

Supports Japanese and English, suitable for different language scenarios

Multi-dataset training

Trained using multiple public datasets and rinna's own datasets to improve the model's generalization ability

Llama-3 chat format

Using the Llama-3 chat format to respond to instructions more accurately

Model Capabilities

Japanese text generation

English text generation

Instruction following

Dialogue system

Use Cases

Intelligent assistant

Japanese Q&A system

Used to build a Japanese intelligent Q&A assistant

Can accurately answer questions about Japanese culture, history, etc.

Multi-language application

Multi-language chatbot

Build a chatbot that supports Japanese and English

🚀 Llama 3 Youko 8B Instruct (rinna/llama-3-youko-8b-instruct)

This is an instruction-tuned language model based on Llama 3, offering enhanced performance in handling instructions and generating responses.

🚀 Quick Start

The Llama 3 Youko 8B Instruct model is an instruction-tuned version of rinna/llama-3-youko-8b. It uses supervised fine-tuning (SFT), Chat Vector, and direct preference optimization (DPO), adopting the Llama-3 chat format.

✨ Features

Model Architecture

It is a 32 - layer, 4096 - hidden - size transformer - based language model. For detailed architecture information, refer to the Llama 3 Model Card.

Training

Built with Meta Llama 3:
- Supervised fine - tuning: The supervised fine - tuning data is a subset of multiple datasets, including CohereForAI/aya_dataset, FLAN, and many others.
- Model merging: The fine - tuned model (llama - 3 - youko - 8b - sft) is enhanced by adding a chat vector. The chat vector is obtained by subtracting the parameter vectors of [meta - llama/Meta - Llama - 3 - 8B](https://huggingface.co/meta - llama/Meta - Llama - 3 - 8B) from those of [meta - llama/Meta - Llama - 3 - 8B - Instruct](https://huggingface.co/meta - llama/Meta - Llama - 3 - 8B - Instruct).
```
llama-3-youko-8b-sft + 0.5 * (meta-llama/Meta-Llama-3-8B-Instruct - meta-llama/Meta-Llama-3-8B)
```
The embedding layer is skipped when subtracting and adding the parameter vectors.
- Direct preference optimization: Applied with a subset of datasets such as [kunishou/HelpSteer - 35k - ja](https://huggingface.co/datasets/kunishou/HelpSteer - 35k - ja) to build the instruct model.

Contributors

Xinqi Chen
[Koh Mitsuda](https://huggingface.co/mitsu - koh)
[Toshiaki Wakatsuki](https://huggingface.co/t - w)
Kei Sawada

Release Date

July 25, 2024

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "rinna/llama-3-youko-8b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "あなたは誠実で優秀なアシスタントです。どうか、簡潔かつ正直に答えてください。"},
    {"role": "user", "content": "西田幾多郎とはどんな人物ですか？"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.convert_tokens_to_ids("<|end_of_text|>"),
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    repetition_penalty=1.1,
)

response = outputs[0][input_ids.shape[-1]:]
response = tokenizer.decode(response, skip_special_tokens=True)
print(response)

Advanced Usage

The instruction - tuned model tends to generate repeated text more often than its base counterpart. Therefore, setting repetition_penalty = 1.1 can improve the generation performance. The same repetition penalty was applied in the evaluation experiments.

📚 Documentation

Benchmarking

Refer to rinna's LM benchmark page (Sheet 20240725).

Tokenization

The model uses the original [meta - llama/Meta - Llama - 3 - 8B - Instruct](https://huggingface.co/meta - llama/Meta - Llama - 3 - 8B - Instruct) tokenizer.

How to Cite

@misc{rinna-llama-3-youko-8b-instruct,
    title = {rinna/llama-3-youko-8b-instruct},
    author = {Chen, Xinqi and Mitsuda, Koh and Wakatsuki, Toshiaki and Sawada, Kei},
    url = {https://huggingface.co/rinna/llama-3-youko-8b-instruct}
}

@inproceedings{sawada2024release,
    title = {Release of Pre-Trained Models for the {J}apanese Language},
    author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    month = {5},
    year = {2024},
    pages = {13898--13905},
    url = {https://aclanthology.org/2024.lrec-main.1213},
    note = {\url{https://arxiv.org/abs/2404.01657}}
}

References

@article{llama3modelcard,
    title = {Llama 3 Model Card},
    author = {AI@Meta},
    year = {2024},
    url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}

@article{huang2023chat,
    title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
    author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
    year = {2023},
    url = {https://arxiv.org/abs/2310.04799}
}

🔧 Technical Details

Property	Details
Model Type	A 32 - layer, 4096 - hidden - size transformer - based language model
Training Data	Subsets of multiple datasets, including CohereForAI/aya_dataset, FLAN, etc.

📄 License

Meta Llama 3 Community License

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご