Opus-MT-En-Afa Open-Source Machine Translation Model - Freely Achieve Mutual Translation between English and Multiple Afro-Asiatic Languages

Opus Mt En Afa

Developed by Helsinki-NLP

This is a machine translation model based on the Transformer architecture, supporting translation from English to multiple Afroasiatic languages, including Arabic dialects, Amharic, Hebrew, etc.

Machine Translation

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multidialectal Arabic Support #Low-resource Language Translation #African Language Coverage

Downloads 103

Release Time : 3/2/2022

Model Overview

This model focuses on translation tasks from English to Afroasiatic languages, employing standardized preprocessing and SentencePiece tokenization techniques, supporting various Arabic dialects and other Afroasiatic languages.

Model Features

Multilingual Support

Supports translation from English to 16 Afroasiatic languages, including various Arabic dialects.

Standardized Preprocessing

Uses standardization and SentencePiece tokenization (spm32k) for text preprocessing.

Language Identifier

Requires adding target language identifiers (e.g., >>ara<<) during translation to help the model recognize the target language.

Model Capabilities

Text translation from English to Afroasiatic languages

Supports translation of various Arabic dialects

Supports low-resource language translation

Use Cases

Cross-language Communication

English-Arabic Translation

Translate English content into Standard Arabic or specific dialects

BLEU 12.0 (Standard Arabic)

English-Hebrew Translation

Daily phrase translation from English to Hebrew

BLEU 32.3

Low-resource Language Support

English-Somali Translation

Provide English content translation for Somali speakers

BLEU 16.0

🚀 English to Afro-Asiatic Languages Translation Model

This project focuses on translating English to a variety of Afro - Asiatic languages. It provides a Transformer - based model with specific pre - processing steps and offers detailed benchmarks and system information.

🚀 Quick Start

The model is designed to translate from English to multiple Afro - Asiatic languages. To use it, you need to follow the pre - processing steps and use the appropriate language tokens.

Model Information

Source Group: English
Target Group: Afro - Asiatic languages
OPUS Readme: [eng - afa](https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/eng - afa/README.md)
Model: Transformer
Source Language(s): eng
Target Language(s): acm afb amh apc ara arq ary arz hau_Latn heb kab mlt rif_Latn shy_Latn som tir
Pre - processing: normalization + SentencePiece (spm32k,spm32k)
Language Token Requirement: A sentence initial language token is required in the form of >>id<< (id = valid target language ID)

Downloads

Original Weights: [opus2m - 2020 - 08 - 01.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.zip)
Test Set Translations: [opus2m - 2020 - 08 - 01.test.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.test.txt)
Test Set Scores: [opus2m - 2020 - 08 - 01.eval.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.eval.txt)

📚 Documentation

Benchmarks

Testset	BLEU	chr - F
Tatoeba - test.eng - amh.eng.amh	11.6	0.504
Tatoeba - test.eng - ara.eng.ara	12.0	0.404
Tatoeba - test.eng - hau.eng.hau	10.2	0.429
Tatoeba - test.eng - heb.eng.heb	32.3	0.551
Tatoeba - test.eng - kab.eng.kab	1.6	0.191
Tatoeba - test.eng - mlt.eng.mlt	17.7	0.551
Tatoeba - test.eng.multi	14.4	0.375
Tatoeba - test.eng - rif.eng.rif	1.7	0.103
Tatoeba - test.eng - shy.eng.shy	0.8	0.090
Tatoeba - test.eng - som.eng.som	16.0	0.429
Tatoeba - test.eng - tir.eng.tir	2.7	0.238

System Info

Property	Details
hf_name	eng - afa
source_languages	eng
target_languages	afa
opus_readme_url	https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/eng - afa/README.md
original_repo	Tatoeba - Challenge
tags	['translation']
languages	['en', 'so', 'ti', 'am', 'he', 'mt', 'ar', 'afa']
src_constituents	{'eng'}
tgt_constituents	{'som', 'rif_Latn', 'tir', 'kab', 'arq', 'afb', 'amh', 'arz', 'heb', 'shy_Latn', 'apc', 'mlt', 'thv', 'ara', 'hau_Latn', 'acm', 'ary'}
src_multilingual	False
tgt_multilingual	True
prepro	normalization + SentencePiece (spm32k,spm32k)
url_model	https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.zip
url_test_set	https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.test.txt
src_alpha3	eng
tgt_alpha3	afa
short_pair	en - afa
chrF2_score	0.375
bleu	14.4
brevity_penalty	1.0
ref_len	58110.0
src_name	English
tgt_name	Afro - Asiatic languages
train_date	2020 - 08 - 01
src_alpha2	en
tgt_alpha2	afa
prefer_old	False
long_pair	eng - afa
helsinki_git_sha	480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
transformers_git_sha	2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
port_machine	brutasse
port_time	2020 - 08 - 21 - 14:41

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご