Opus - mt - tc - big - ar - en Open - Source Translation Model - Free and Accurate Arabic to English Translation

Home

Opus Mt Tc Big Ar En

Developed by Helsinki-NLP

This is a neural machine translation model for Arabic to English translation, part of the OPUS-MT project.

Machine Translation

Transformers

Supports Multiple Languages#Arabic-English Translation #High BLEU Score #Multi-dialect Support

Downloads 18.14k

Release Time : 4/13/2022

Model Overview

This model is specifically designed for translating Arabic (including Gulf dialects, Modern Standard Arabic, and Egyptian Arabic) into English, trained using the transformer-big architecture.

Model Features

Multi-dialect Support

Supports translation from Gulf Arabic, Modern Standard Arabic, and Egyptian Arabic to English.

High-performance Translation

Performs excellently on multiple test sets, achieving BLEU scores between 42.6 and 47.3.

Open-source License

Uses the cc-by-4.0 license, allowing for wide usage and modification.

Model Capabilities

Arabic to English text translation

Supports multiple Arabic dialects

Batch text processing

Use Cases

Text Translation

Document Translation

Translate Arabic documents into English

High-quality translation results with BLEU score reaching 47.3

Website Localization

Translate Arabic website content into English

Education

Language Learning Assistance

Help English learners understand Arabic materials

🚀 opus-mt-tc-big-ar-en

A neural machine translation model designed for translating from Arabic (ar) to English (en).

This model is part of the OPUS-MT project, an initiative aimed at making neural machine translation models widely available and accessible for numerous languages worldwide. All models are initially trained using the outstanding framework of Marian NMT, an efficient NMT implementation written in pure C++. The models have been converted to pyTorch using the transformers library by huggingface. Training data is sourced from OPUS, and training pipelines follow the procedures of OPUS-MT-train.

Publications: OPUS-MT – Building open translation services for the World and The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT (Please, cite if you use this model.)

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

🚀 Quick Start

This is a neural machine translation model for translating from Arabic to English. You can quickly start using it through the following steps.

✨ Features

Multilingual Support: Supports translation from Arabic to English.
Open Source Project: Part of the OPUS - MT project, with open - source training data and pipelines.
High - performance Framework: Trained using the Marian NMT framework and converted to pyTorch.

📦 Installation

The README does not provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "اتبع قلبك فحسب.",
    "وين راهي دّوش؟"
]

model_name = "pytorch-models/opus-mt-tc-big-ar-en"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     Just follow your heart.
#     Wayne Rahi Dosh?

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-ar-en")
print(pipe("اتبع قلبك فحسب."))

# expected output: Just follow your heart.

📚 Documentation

Model Info

Property	Details
Release	2022 - 03 - 09
Source Language(s)	afb ara arz
Target Language(s)	eng
Model Type	transformer - big
Training Data	opusTCv20210807+bt (source)
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	opusTCv20210807+bt_transformer-big_2022-03-09.zip
More Information	OPUS-MT ara-eng README

Benchmarks

test set translations: opusTCv20210807+bt_transformer-big_2022-03-09.test.txt
test set scores: opusTCv20210807+bt_transformer-big_2022-03-09.eval.txt
benchmark results: benchmark_results.txt
benchmark output: benchmark_translations.zip

langpair	testset	chr-F	BLEU	#sent	#words
ara-eng	tatoeba-test-v2021-08-07	0.63477	47.3	10305	76975
ara-eng	flores101-devtest	0.66987	42.6	1012	24721
ara-eng	tico19-test	0.68521	44.4	2100	56323

Model Conversion Info

Property	Details
Transformers Version	4.16.2
OPUS - MT Git Hash	3405783
Port Time	Wed Apr 13 18:17:57 EEST 2022
Port Machine	LM0 - 400 - 22516.local

🔧 Technical Details

The README does not provide specific technical details, so this section is skipped.

📄 License

This model is released under the cc - by - 4.0 license.

Acknowledgements

The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご