Terjman-Large-v2.0 Open-Source Translation Model - Efficiently Translate between English and Moroccan Dialects for Free

Terjman Large V2.0

Developed by BounharAbdelaziz

Terjman Large-v2.0 is a Transformer-based English-Moroccan dialect translation model with significantly improved performance, comparable to commercial models.

Machine Translation

Transformers

Supports Multiple Languages#English-Moroccan Dialect Translation #Culturally Sensitive Translation #Transformer Architecture

Downloads 20

Release Time : 2/2/2025

Model Overview

A fine-tuned model dedicated to high-quality and precise translation, especially excelling in English to Moroccan dialect translation tasks.

Model Features

Specialized in English→Moroccan Dialect Translation

Focused on high-quality translation tasks from English to Moroccan dialect.

Top Performance Among Open-Source Models

Outstanding performance among open-source models, comparable to some commercial models.

Perfect Compatibility with 🤗 Transformers

Easily deployable in various hardware environments.

Model Capabilities

English to Moroccan Dialect Translation

High-Quality Text Conversion

Culturally Accurate Expression

Use Cases

Language Translation

Daily Conversation Translation

Translate English daily conversations into Moroccan dialect.

Maintains natural expression within cultural context.

Content Localization

Localize English content into Moroccan dialect versions.

Ensures cultural relevance and accuracy.

🚀 🇲🇦 Terjman-Large-v2.0 (240M)

Terjman-Large-v2.0 is an enhanced version of atlasia/Terjman-Large-v1.2. Built on the robust Transformer architecture, it's fine-tuned for delivering high-quality and accurate translations. This model is trained on a larger and more refined dataset, leading to improved translation performance. It achieves results comparable to gpt-4o-2024-08-06 on TerjamaBench, an evaluation benchmark for English-Moroccan darija translation models that emphasizes the cultural aspect.

🚀 Quick Start

You can use the model with the Hugging Face Transformers library:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "BounharAbdelaziz/Terjman-Large-v2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def translate(text):
    inputs = tokenizer(text, return_tensors="pt")
    output = model.generate(**inputs)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example translation
text = "Hello there! Today the weather is so nice in Geneva, couldn't ask for more to enjoy the holidays :)"
translation = translate(text)
print("Translation:", translation)
# prints: صباح الخير! اليوم الطقس زوين بزاف فجنيف، ما قدرتش نطلب المزيد باش نستمتعو بالعطل:)

✨ Features

✅ Fine-tuned specifically for English to Moroccan darija translation.
✅ Achieves state-of-the-art performance among open-source models.
✅ Compatible with 🤗 Transformers and can be easily deployed on various hardware setups.

🔥 Performance Comparison

The following table compares Terjman-Large-v2.0 with proprietary and open-source models using BLEU, chrF, and TER scores. Higher BLEU/chrF scores and lower TER scores indicate better translation quality.

Property	Details
Model Type	Sequence-to-sequence model based on Transformer architecture
Training Data	High-quality parallel corpora with high-quality translations

Model	Size	BLEU↑	chrF↑	TER↓
Proprietary Models
gemini-exp-1206	*	30.69	54.16	67.62
claude-3-5-sonnet-20241022	*	30.51	51.80	67.42
gpt-4o-2024-08-06	*	28.30	50.13	71.77
Open-Source Models
Terjman-Ultra-v2.0	1.3B	25.00	44.70	77.20
Terjman-Supreme-v2.0	3.3B	23.43	44.57	78.17
Terjman-Large-v2.0 (This model)	240M	22.67	42.57	83.00
Terjman-Nano-v2.0	77M	18.84	38.41	94.73
atlasia/Terjman-Large-v1.2.2	240M	16.33	37.10	89.13
MBZUAI-Paris/Atlas-Chat-9B	9B	14.80	35.26	93.95
facebook/nllb-200-3.3B	3.3B	14.76	34.17	94.33
atlasia/Terjman-Nano	77M	09.98	26.55	106.49

🔧 Technical Details

Base Model: atlasia/Terjman-Large-v1.2
Architecture: Transformer-based sequence-to-sequence model
Training Data: High-quality parallel corpora with high-quality translations
Training Precision: FP16 for efficient inference

📦 Installation

Run in a Hugging Face Space

Try the model interactively in the Terjman-Large Space 🤗

Use with Text Generation Inference (TGI)

For fast inference, use Hugging Face TGI:

pip install text-generation
text-generation-launcher --model-id BounharAbdelaziz/Terjman-Large-v2.0

Run Locally with Transformers & PyTorch

pip install transformers torch
python -c "from transformers import pipeline; print(pipeline('translation', model='BounharAbdelaziz/Terjman-Large-v2.0')('Hello there!'))"

Deploy on an API Server

Use FastAPI to serve translations as an API:

from fastapi import FastAPI
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

app = FastAPI()
model_name = "BounharAbdelaziz/Terjman-Large-v2.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

@app.get("/translate/")
def translate(text: str):
    inputs = tokenizer(text, return_tensors="pt")
    output = model.generate(**inputs)
    return {"translation": tokenizer.decode(output[0], skip_special_tokens=True)}

📚 Documentation

Training Details Hyperparameters

The model was fine-tuned using the following training settings:

Learning Rate: 0.001
Training Batch Size: 16
Evaluation Batch Size: 16
Seed: 42
Gradient Accumulation Steps: 8
Total Effective Batch Size: 128
Optimizer: AdamW (Torch) with betas=(0.9,0.999), epsilon=1e-08
Learning Rate Scheduler: Linear
Warmup Ratio: 0.1
Epochs: 2
Precision: Mixed FP16 for efficient training

Framework versions

Transformers 4.47.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.21.0

📄 License

This model is released under the CC BY-NC (Creative Commons Attribution-NonCommercial) license, which means it can be used for research and personal projects but not for commercial purposes. For commercial use, please get in touch :)

@misc{terjman-v2,
  title = {Terjman-v2: High-Quality English-Moroccan Darija Translation Model},
  author={Abdelaziz Bounhar},
  year={2025},
  howpublished = {\url{https://huggingface.co/BounharAbdelaziz/Terjman-Large-v2.0}},
  license = {CC BY-NC}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご