Terjman Nano V2.0

Developed by atlasia

Terjman-Nano-v2.0 is a Transformer-based English-Moroccan dialect translation model with 77M parameters, optimized for high-quality and precise translation.

Machine Translation

Transformers

Supports Multiple Languages#English-Moroccan Dialect Translation #Lightweight Transformer #Culturally Adapted Translation

Downloads 95

Release Time : 3/11/2025

Model Overview

This model is an improved version of Terjman-Nano-v1, trained on a larger and more refined dataset, with significantly enhanced translation performance. It performs excellently on the TerjamaBench evaluation benchmark, comparable to gpt-4o-2024-08-06.

Model Features

High-Quality Translation Optimization

Optimized for English→Moroccan dialect translation, providing accurate translation results

Top Performance Among Open-Source Models

Outstanding performance in open-source models, with high BLEU/chrF scores and low TER scores

Compatible with 🤗 Transformers

Easily deployable on various hardware, supporting multiple deployment solutions

Model Capabilities

English to Moroccan Dialect Translation

High-Quality Text Generation

Cross-Language Translation

Use Cases

Language Translation

Daily Conversation Translation

Translate English daily conversations into Moroccan dialect

Outputs natural and fluent Moroccan dialect text

Cultural Content Translation

Handles translation of texts containing culture-specific expressions

Accurately conveys cultural connotations and linguistic habits

library_name: transformers license: cc-by-nc-4.0 base_model: atlasia/Terjman-Nano-v1 metrics:

bleu
chrf
ter model-index:
name: Terjman-Nano-v2.0 results: [] datasets:
BounharAbdelaziz/Terjman-v2-English-Darija-Dataset-350K language:
ary
en pipeline_tag: translation

🇲🇦 Terjman-Nano-v2.0 (77M) 🚀

Terjman-Nano-v2.0 is an improved version of atlasia/Terjman-Nano-v1, built on the powerful Transformer architecture and fine-tuned for high-quality, accurate translations.

This version is based on atlasia/Terjman-Nano-v1 and has been trained on a larger and more refined dataset, leading to improved translation performance. The model achieves results on par with gpt-4o-2024-08-06 on TerjamaBench, an evaluation benchmark for English-Moroccan darija translation models, that challenges the models more on the cultural aspect.

🚀 Features

✅ Fine-tuned for English->Moroccan darija translation.
✅ State-of-the-art performance among open-source models.
✅ Compatible with 🤗 Transformers and easily deployable on various hardware setups.

🔥 Performance Comparison

The following table compares Terjman-Nano-v2.0 against proprietary and open-source models using BLEU, chrF, and TER scores. Higher BLEU/chrF and lower TER indicate better translation quality.

Model	Size	BLEU↑	chrF↑	TER↓
Proprietary Models
gemini-exp-1206	*	30.69	54.16	67.62
claude-3-5-sonnet-20241022	*	30.51	51.80	67.42
gpt-4o-2024-08-06	*	28.30	50.13	71.77
Open-Source Models
Terjman-Ultra-v2.0	1.3B	25.00	44.70	77.20
Terjman-Supreme-v2.0	3.3B	23.43	44.57	78.17
Terjman-Large-v2.0	240M	22.67	42.57	83.00
Terjman-Nano-v2.0 (This model)	77M	18.84	38.41	94.73
atlasia/Terjman-Large-v1.2	240M	16.33	37.10	89.13
MBZUAI-Paris/Atlas-Chat-9B	9B	14.80	35.26	93.95
facebook/nllb-200-3.3B	3.3B	14.76	34.17	94.33
atlasia/Terjman-Nano	77M	09.98	26.55	106.49

🔬 Model Details

Base Model: atlasia/Terjman-Nano-v1
Architecture: Transformer-based sequence-to-sequence model
Training Data: High-quality parallel corpora with high quality translations
Training Precision: FP16 for efficient inference

🚀 How to Use

You can use the model with the Hugging Face Transformers library:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "BounharAbdelaziz/Terjman-Nano-v2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def translate(text):
    inputs = tokenizer(text, return_tensors="pt")
    output = model.generate(**inputs)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example translation
text = "Hello there! Today the weather is so nice in Geneva, couldn't ask for more to enjoy the holidays :)"
translation = translate(text)
print("Translation:", translation)
# prints: صباح الخير! اليوم الطقس زوين بزاف فجنيف، ما قدرش نطلب أكثر باش نستمتع بالعطلات:)

🖥️ Deployment

Run in a Hugging Face Space

Try the model interactively in the Terjman-Nano Space 🤗

Use with Text Generation Inference (TGI)

For fast inference, use Hugging Face TGI:

pip install text-generation
text-generation-launcher --model-id BounharAbdelaziz/Terjman-Nano-v2.0

Run Locally with Transformers & PyTorch

pip install transformers torch
python -c "from transformers import pipeline; print(pipeline('translation', model='BounharAbdelaziz/Terjman-Nano-v2.0')('Hello there!'))"

Deploy on an API Server

Use FastAPI to serve translations as an API:

from fastapi import FastAPI
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

app = FastAPI()
model_name = "BounharAbdelaziz/Terjman-Nano-v2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

@app.get("/translate/")
def translate(text: str):
    inputs = tokenizer(text, return_tensors="pt")
    output = model.generate(**inputs)
    return {"translation": tokenizer.decode(output[0], skip_special_tokens=True)}

🛠️ Training Details Hyperparameters**

The model was fine-tuned using the following training settings:

Learning Rate: 0.0001
Training Batch Size: 64
Evaluation Batch Size: 64
Seed: 42
Gradient Accumulation Steps: 4
Total Effective Batch Size: 256
Optimizer: AdamW (Torch) with betas=(0.9,0.999), epsilon=1e-08
Learning Rate Scheduler: Linear
Warmup Ratio: 0.1
Epochs: 5
Precision: Mixed FP16 for efficient training

Framework versions

Transformers 4.47.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.21.0

📜 License

This model is released under the CC BY-NC (Creative Commons Attribution-NonCommercial) license, meaning it can be used for research and personal projects but not for commercial purposes. For commercial use, please get in touch :)

@misc{terjman-v2,
  title = {Terjman-v2: High-Quality English-Moroccan Darija Translation Model},
  author={Abdelaziz Bounhar},
  year={2025},
  howpublished = {\url{https://huggingface.co/BounharAbdelaziz/Terjman-Nano-v2.0}},
  license = {CC BY-NC}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご