🚀 Llama 3 Youko 8B Instruct (rinna/llama-3-youko-8b-instruct)
This is an instruction-tuned language model based on Llama 3, offering enhanced performance in handling instructions and generating responses.
🚀 Quick Start
The Llama 3 Youko 8B Instruct
model is an instruction-tuned version of rinna/llama-3-youko-8b. It uses supervised fine-tuning (SFT), Chat Vector, and direct preference optimization (DPO), adopting the Llama-3 chat format.
✨ Features
Model Architecture
It is a 32 - layer, 4096 - hidden - size transformer - based language model. For detailed architecture information, refer to the Llama 3 Model Card.
Training
Contributors
- Xinqi Chen
- [Koh Mitsuda](https://huggingface.co/mitsu - koh)
- [Toshiaki Wakatsuki](https://huggingface.co/t - w)
- Kei Sawada
Release Date
July 25, 2024
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "rinna/llama-3-youko-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "あなたは誠実で優秀なアシスタントです。どうか、簡潔かつ正直に答えてください。"},
{"role": "user", "content": "西田幾多郎とはどんな人物ですか?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.convert_tokens_to_ids("<|end_of_text|>"),
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=512,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.1,
)
response = outputs[0][input_ids.shape[-1]:]
response = tokenizer.decode(response, skip_special_tokens=True)
print(response)
Advanced Usage
The instruction - tuned model tends to generate repeated text more often than its base counterpart. Therefore, setting repetition_penalty = 1.1
can improve the generation performance. The same repetition penalty was applied in the evaluation experiments.
📚 Documentation
Benchmarking
Refer to rinna's LM benchmark page (Sheet 20240725).
Tokenization
The model uses the original [meta - llama/Meta - Llama - 3 - 8B - Instruct](https://huggingface.co/meta - llama/Meta - Llama - 3 - 8B - Instruct) tokenizer.
How to Cite
@misc{rinna-llama-3-youko-8b-instruct,
title = {rinna/llama-3-youko-8b-instruct},
author = {Chen, Xinqi and Mitsuda, Koh and Wakatsuki, Toshiaki and Sawada, Kei},
url = {https://huggingface.co/rinna/llama-3-youko-8b-instruct}
}
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
References
@article{llama3modelcard,
title = {Llama 3 Model Card},
author = {AI@Meta},
year = {2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
@article{huang2023chat,
title = {Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages},
author = {Huang, Shih-Cheng and Li, Pin-Zu and Hsu, Yu-Chi and Chen, Kuang-Ming and Lin, Yu Tung and Hsiao, Shih-Kai and Tzong-Han Tsai, Richard and Lee, Hung-yi},
year = {2023},
url = {https://arxiv.org/abs/2310.04799}
}
🔧 Technical Details
Property |
Details |
Model Type |
A 32 - layer, 4096 - hidden - size transformer - based language model |
Training Data |
Subsets of multiple datasets, including CohereForAI/aya_dataset, FLAN, etc. |
📄 License
Meta Llama 3 Community License