đ English to Afro-Asiatic Languages Translation Model
This project focuses on translating English to a variety of Afro - Asiatic languages. It provides a Transformer - based model with specific pre - processing steps and offers detailed benchmarks and system information.
đ Quick Start
The model is designed to translate from English to multiple Afro - Asiatic languages. To use it, you need to follow the pre - processing steps and use the appropriate language tokens.
Model Information
- Source Group: English
- Target Group: Afro - Asiatic languages
- OPUS Readme: [eng - afa](https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/eng - afa/README.md)
- Model: Transformer
- Source Language(s): eng
- Target Language(s): acm afb amh apc ara arq ary arz hau_Latn heb kab mlt rif_Latn shy_Latn som tir
- Pre - processing: normalization + SentencePiece (spm32k,spm32k)
- Language Token Requirement: A sentence initial language token is required in the form of
>>id<<
(id = valid target language ID)
Downloads
- Original Weights: [opus2m - 2020 - 08 - 01.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.zip)
- Test Set Translations: [opus2m - 2020 - 08 - 01.test.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.test.txt)
- Test Set Scores: [opus2m - 2020 - 08 - 01.eval.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.eval.txt)
đ Documentation
Benchmarks
Testset |
BLEU |
chr - F |
Tatoeba - test.eng - amh.eng.amh |
11.6 |
0.504 |
Tatoeba - test.eng - ara.eng.ara |
12.0 |
0.404 |
Tatoeba - test.eng - hau.eng.hau |
10.2 |
0.429 |
Tatoeba - test.eng - heb.eng.heb |
32.3 |
0.551 |
Tatoeba - test.eng - kab.eng.kab |
1.6 |
0.191 |
Tatoeba - test.eng - mlt.eng.mlt |
17.7 |
0.551 |
Tatoeba - test.eng.multi |
14.4 |
0.375 |
Tatoeba - test.eng - rif.eng.rif |
1.7 |
0.103 |
Tatoeba - test.eng - shy.eng.shy |
0.8 |
0.090 |
Tatoeba - test.eng - som.eng.som |
16.0 |
0.429 |
Tatoeba - test.eng - tir.eng.tir |
2.7 |
0.238 |
System Info
Property |
Details |
hf_name |
eng - afa |
source_languages |
eng |
target_languages |
afa |
opus_readme_url |
https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/eng - afa/README.md |
original_repo |
Tatoeba - Challenge |
tags |
['translation'] |
languages |
['en', 'so', 'ti', 'am', 'he', 'mt', 'ar', 'afa'] |
src_constituents |
{'eng'} |
tgt_constituents |
{'som', 'rif_Latn', 'tir', 'kab', 'arq', 'afb', 'amh', 'arz', 'heb', 'shy_Latn', 'apc', 'mlt', 'thv', 'ara', 'hau_Latn', 'acm', 'ary'} |
src_multilingual |
False |
tgt_multilingual |
True |
prepro |
normalization + SentencePiece (spm32k,spm32k) |
url_model |
https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.zip |
url_test_set |
https://object.pouta.csc.fi/Tatoeba - MT - models/eng - afa/opus2m - 2020 - 08 - 01.test.txt |
src_alpha3 |
eng |
tgt_alpha3 |
afa |
short_pair |
en - afa |
chrF2_score |
0.375 |
bleu |
14.4 |
brevity_penalty |
1.0 |
ref_len |
58110.0 |
src_name |
English |
tgt_name |
Afro - Asiatic languages |
train_date |
2020 - 08 - 01 |
src_alpha2 |
en |
tgt_alpha2 |
afa |
prefer_old |
False |
long_pair |
eng - afa |
helsinki_git_sha |
480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 |
transformers_git_sha |
2207e5d8cb224e954a7cba69fa4ac2309e9ff30b |
port_machine |
brutasse |
port_time |
2020 - 08 - 21 - 14:41 |
đ License
This project is licensed under the Apache - 2.0 license.