๐ LLAMA 3 Youko QLoRA Fine-tune for Japanese-English Translation
This project is a QLoRA fine-tune of LLaMA 3 Youko, leveraging a new version of the VNTL dataset. Its core value lies in enhancing the performance of large language models (LLMs) in translating Japanese visual novels to English, offering more accurate and stable translation results.
๐ License
๐ฆ Dataset
- Datasets: lmg-anon/VNTL-v5-1k
๐ฃ๏ธ Language
- Supported Languages: ja, en
๐ง Base Model
- Base Model: rinna/llama-3-youko-8b
๐ ๏ธ Pipeline Tag
- Pipeline Tag: translation
๐ Quick Start
This is a LLaMA 3 Youko QLoRA fine-tune, created using a new version of the VNTL dataset. The aim is to improve the performance of LLMs in translating Japanese visual novels to English. Unlike the previous version, this one doesn't include the "chat mode".
โจ Features
- Enhanced Performance: The new version of VNTL 8B has been rebuilt and expanded from the ground up. It outperforms the previous version in terms of accuracy and stability, making far fewer mistakes even at high temperatures.
- Prompt Format Change: Switched to the default LLaMA3 prompt format to address users' difficulties with the custom one.
- Multi-line Translation Support: Added proper support for multi-line translations, while the old version only handled single lines.
- Higher Translation Accuracy: Overall better translation accuracy, although the translations tend to be more literal compared to the previous version.
๐ง Technical Details
Training Parameters
This fine-tune uses similar hyperparameters as the previous version, with the only difference being the brand-new dataset.
Parameter |
Value |
Rank |
128 |
Alpha |
32 |
Effective Batch Size |
45 |
Warmup Ratio |
0.02 |
Learning Rate |
6e-5 |
Embedding Learning Rate |
1e-5 |
Optimizer |
grokadamw |
LR Schedule |
cosine |
Weight Decay |
0.01 |
Train Loss: 0.42
๐ Documentation
Notes
For this new version of VNTL 8B, the dataset has been rebuilt and expanded from scratch. It performs really well, outperforming the previous version in accuracy and stability. It makes far fewer mistakes even at high temperatures (though temperature 0 is still recommended for the best accuracy).
Some major changes in this version:
- Switched to the default LLaMA3 prompt format since people had trouble with the custom one
- Added proper support for multi-line translations (the old version only handled single lines)
- Overall better translation accuracy
One thing to note: while the translations are more accurate, they tend to be more literal compared to the previous version.
Sampling Recommendations
๐ก Usage Tip
For optimal results, it's highly recommended to use neutral sampling parameters (temperature 0 with no repetition penalty) when using this model.
Translation Prompt
This fine-tune uses the LLaMA 3 prompt format. Here is an example prompt for translation:
<|begin_of_text|><|start_header_id|>Metadata<|end_header_id|>
[character] Name: Uryuu Shingo (็็ ๆฐๅพ) | Gender: Male | Aliases: Onii-chan (ใๅ
ใกใใ)
[character] Name: Uryuu Sakuno (็็ ๆกไน) | Gender: Female<|eot_id|><|start_header_id|>Japanese<|end_header_id|>
[ๆกไน]: ใโฆโฆใใใใ<|eot_id|><|start_header_id|>English<|end_header_id|>
[Sakuno]: ใ... Sorry.ใ<|eot_id|><|start_header_id|>Japanese<|end_header_id|>
[ๆฐๅพ]: ใใใใใใใ่จใฃใกใใชใใ ใใฉใ่ฟทๅญใงใใใฃใใใๆกไนใฏๅฏๆใใใใใใใใๅฟ้
ใใกใใฃใฆใใใ ใไฟบใ<|eot_id|><|start_header_id|>English<|end_header_id|>
[Shingo]: "Nah, I know itโs weird to say this, but Iโm glad you got lost. Youโre so cute, Sakuno, so I was really worried about you."<|eot_id|>
The generated translation for that prompt, with temperature 0, is:
[Shingo]: "Nah, I know itโs weird to say this, but Iโm glad you got lost. Youโre so cute, Sakuno, so I was really worried about you."
Trivia
The Metadata section isn't limited to character information - you can also add trivia and teach the model the correct way to pronounce words it struggles with.
Here's an example:
<|begin_of_text|><|start_header_id|>Metadata<|end_header_id|>
[character] Name: Uryuu Shingo (็็ ๆฐๅพ) | Gender: Male | Aliases: Onii-chan (ใๅ
ใกใใ)
[character] Name: Uryuu Sakuno (็็ ๆกไน) | Gender: Female
[element] Name: Murasamemaru (ๅข้จไธธ) | Type: Quality<|eot_id|><|start_header_id|>Japanese<|end_header_id|>
[ๆกไน]: ใโฆโฆใใใใ<|eot_id|><|start_header_id|>English<|end_header_id|>
[Sakuno]: ใ... Sorry.ใ<|eot_id|><|start_header_id|>Japanese<|end_header_id|>
[ๆฐๅพ]: ใใใใใใใ่จใฃใกใใชใใ ใใฉใ่ฟทๅญใงใใใฃใใใๆกไนใฏๅข้จไธธใใใใใใใใๅฟ้
ใใกใใฃใฆใใใ ใไฟบใ<|eot_id|><|start_header_id|>English<|end_header_id|>
The generated translation for that prompt, with temperature 0, is:
[Shingo]: "Nah, I know itโs not the best thing to say, but Iโm glad you got lost. Sakunoโs Murasamemaru, so I was really worried about you, you know?"