đ Guanaco-leh-V2: A Multilingual Instruction-Following Language Model Based on LLaMA 7B
Guanaco-leh-V2 is a multilingual instruction-following language model based on LLaMA 7B, trained with specific techniques to enhance performance in multiple languages and chatbot scenarios.
đ Quick Start
This model is trained with guanaco-lora, where lora + embed_tokens + lm_head are trained. The dataset is sourced from alpaca-cleaned and guanaco.
With trained embeddings and heads, the model performs better in Chinese and Japanese than the original LLaMA, especially when using instruction-based prompts. This makes the model easier to use.
Since this model is trained on the guanaco dataset, it can also be used as a chatbot. Use the following format:
### Instruction:
User: <Message history>
Assistant: <Message history>
### Input:
System: <System response for next message, optional>
User: <Next message>
### Response:
â ī¸ Important Note
I just removed the first line of the original prompt to reduce token consumption. Please consider removing it when using this model.
⨠Features
Difference between previous model
The main differences are:
- The model is trained on bf16 instead of 8-bit.
- The ctx cut-off length is increased to 1024.
- A larger dataset is used (latest guanaco + alpaca cleaned = 540k entries).
- A larger batch size is used (64 -> 128).
Since the training data contains more chat-based data, this model is more suitable for chatbot usage.
Try this model
You can try this model with this colab. Or use the generate.py
in the guanaco-lora. All the examples are generated by guanaco-lora.
If you want to use the lora model from guanaco-7b-leh-v2-adapter/
, remember to turn off the load_in_8bit
, or manually merge it into the 7B model!
đĄ Usage Tip
Recommended generation parameters:
- Temperature: 0.5 - 0.7
- Top p: 0.65 - 1.0
- Top k: 30 - 50
- Repeat penalty: 1.03 - 1.17
đ§ Technical Details
Training Setup
- 2x3090 with model parallel
- Batch size = bsz 8 * grad acc 16 = 128
- ctx cut-off length = 1024
- Only train on output (with loss mask)
- Enable group of len
- 538k entries, 2 epochs (about 8400 steps)
- lr 2e-4
Why use lora+embed+head
First, it is obvious that when a large language model (LLM) is not proficient in a certain language and you want to fine-tune it, you should train the embedding and head parts.
But the question is: "Why not just perform native fine-tuning?"
If you have searched for some alpaca models or training materials, you may notice that many of them have one problem: "memorization". The loss will drop at the beginning of each epoch, similar to a kind of "overfitting".
In my opinion, this is because the number of parameters in LLaMA is too large, so it simply memorizes all the training data.
However, if I use lora for the attention part (ignoring the MLP part), the number of parameters is not large enough for "memorizing the training data", so it is less likely to memorize everything.
đģ Usage Examples
Basic Usage
You can try this model with the provided colab.
Advanced Usage
If you want to use the lora model from guanaco-7b-leh-v2-adapter/
, remember to turn off the load_in_8bit
, or manually merge it into the 7B model.
đ Documentation
Some Example
As shown in the following images, although Guanaco can reply fluently, the content can be quite confusing. So you may want to add some information in the system part.

I used Guanaco with instructions to translate a Chinese article into Japanese, German, and English. Then I used GPT-4 to score them and obtained the following results:

đ License
This project is licensed under the GPL-3.0 license.
Information Table
Property |
Details |
Model Type |
Multilingual Instruction-Following Language Model Based on LLaMA 7B |
Training Data |
alpaca-cleaned, guanaco |
Supported Languages |
English, Chinese, Japanese |
Tags |
llama, guanaco, alpaca, lora, finetune |