🚀 LlaMaestra - A tiny Llama model tuned for text translation
LlaMaestra is a small Llama model specifically fine - tuned for text translation between English and Italian, offering efficient and direct translation capabilities.
_ _ ___ ___ _
| | | | | \/ | | |
| | | | __ _| . . | __ _ ___ ___| |_ _ __ __ _
| | | |/ _` | |\/| |/ _` |/ _ \/ __| __| '__/ _` |
| |___| | (_| | | | | (_| | __/\__ \ |_| | | (_| |
\_____/_|\__,_\_| |_/\__,_|\___||___/\__|_| \__,_|
⚠️ Important Note
For EN - IT translation, check out the lightweight Quadrifoglio models as well.
🚀 Quick Start
This section provides a quick guide on how to use the LlaMaestra model for text translation.
✨ Features
- Fine - Tuned for Translation: The model was finetuned with approximately 300,000 examples of English - Italian and Italian - English translations.
- Direct Translation: It is designed to directly provide translations without excessive explanations.
- CPU - Friendly: Due to its small size, it runs very well on CPUs.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "LeonardPuettmann/LlaMaestra-3.2-1B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id, add_bos_token=True, trust_remote_code=True)
row_json = [
{"role": "system", "content": "Your job is to return translations for sentences or words from either Italian to English or English to Italian."},
{"role": "user", "content": "Do you sell tickets for the bus?"},
]
prompt = tokenizer.apply_chat_template(row_json, tokenize=False)
model_input = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
print(tokenizer.decode(model.generate(**model_input, max_new_tokens=1024)[0]))
📚 Documentation
Model Card
This model was finetuned with roughly 300,000 examples of translations from English to Italian and Italian to English. The model was finetuned in a way to more directly provide a translation without much explanation.
Finetuning took about 10 hours on an A10G Nvidia GPU.
Due to its size, the model runs very well on CPUs.

Data used
The source for the data were sentence pairs from tatoeba.com. The data can be downloaded from here: https://tatoeba.org/downloads
🔧 Technical Details
The model is based on the meta - llama/Llama - 3.2 - 1B - Instruct
base model and was fine - tuned for text translation.
📄 License
The model uses the llama3.2
license.
📄 Credits