đ English to Niger - Kordofanian Languages Translation Model
This project focuses on translating English to Niger - Kordofanian languages. It provides a transformer - based model with specific pre - processing steps and offers various evaluation metrics on test sets.
đ Quick Start
The model is designed to translate English to Niger - Kordofanian languages. You can download the original weights, test set translations, and test set scores from the provided links.
Download Links
⨠Features
- Language Support: Supports translation from English to multiple Niger - Kordofanian languages including
bam_Latn
, ewe
, fuc
, fuv
, ibo
, kin
, etc.
- Pre - processing: Uses normalization and SentencePiece (spm32k, spm32k) for pre - processing.
- Language Token Requirement: A sentence initial language token in the form of
>>id<<
(id = valid target language ID) is required.
đ Documentation
Model Details
- Model Type: Transformer
- Source Language: English (
eng
)
- Target Languages:
bam_Latn
, ewe
, fuc
, fuv
, ibo
, kin
, lin
, lug
, nya
, run
, sag
, sna
, swh
, toi_Latn
, tso
, umb
, wol
, xho
, yor
, zul
Pre - processing
The model uses normalization and SentencePiece (spm32k, spm32k) for pre - processing.
System Info
Property |
Details |
hf_name |
eng - nic |
source_languages |
eng |
target_languages |
nic |
opus_readme_url |
https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-nic/README.md |
original_repo |
Tatoeba - Challenge |
tags |
['translation'] |
languages |
['en', 'sn', 'rw', 'wo', 'ig', 'sg', 'ee', 'zu', 'lg', 'ts', 'ln', 'ny', 'yo', 'rn', 'xh', 'nic'] |
src_constituents |
{'eng'} |
tgt_constituents |
{'bam_Latn', 'sna', 'kin', 'wol', 'ibo', 'swh', 'sag', 'ewe', 'zul', 'fuc', 'lug', 'tso', 'lin', 'nya', 'yor', 'run', 'xho', 'fuv', 'toi_Latn', 'umb'} |
src_multilingual |
False |
tgt_multilingual |
True |
prepro |
normalization + SentencePiece (spm32k, spm32k) |
url_model |
https://object.pouta.csc.fi/Tatoeba-MT-models/eng-nic/opus-2020-07-27.zip |
url_test_set |
https://object.pouta.csc.fi/Tatoeba-MT-models/eng-nic/opus-2020-07-27.test.txt |
src_alpha3 |
eng |
tgt_alpha3 |
nic |
short_pair |
en - nic |
chrF2_score |
0.42700000000000005 |
bleu |
11.1 |
brevity_penalty |
1.0 |
ref_len |
10625.0 |
src_name |
English |
tgt_name |
Niger - Kordofanian languages |
train_date |
2020 - 07 - 27 |
src_alpha2 |
en |
tgt_alpha2 |
nic |
prefer_old |
False |
long_pair |
eng - nic |
helsinki_git_sha |
480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 |
transformers_git_sha |
2207e5d8cb224e954a7cba69fa4ac2309e9ff30b |
port_machine |
brutasse |
port_time |
2020 - 08 - 21 - 14:41 |
đ License
This project is licensed under the Apache - 2.0 license.
đ Benchmarks
Testset |
BLEU |
chr - F |
Tatoeba - test.eng - bam.eng.bam |
6.2 |
0.029 |
Tatoeba - test.eng - ewe.eng.ewe |
4.5 |
0.258 |
Tatoeba - test.eng - ful.eng.ful |
0.5 |
0.073 |
Tatoeba - test.eng - ibo.eng.ibo |
3.9 |
0.267 |
Tatoeba - test.eng - kin.eng.kin |
6.4 |
0.475 |
Tatoeba - test.eng - lin.eng.lin |
1.2 |
0.308 |
Tatoeba - test.eng - lug.eng.lug |
3.9 |
0.405 |
Tatoeba - test.eng.multi |
11.1 |
0.427 |
Tatoeba - test.eng - nya.eng.nya |
14.0 |
0.622 |
Tatoeba - test.eng - run.eng.run |
13.6 |
0.477 |
Tatoeba - test.eng - sag.eng.sag |
5.5 |
0.199 |
Tatoeba - test.eng - sna.eng.sna |
19.6 |
0.557 |
Tatoeba - test.eng - swa.eng.swa |
1.8 |
0.163 |
Tatoeba - test.eng - toi.eng.toi |
8.3 |
0.231 |
Tatoeba - test.eng - tso.eng.tso |
50.0 |
0.789 |
Tatoeba - test.eng - umb.eng.umb |
7.8 |
0.342 |
Tatoeba - test.eng - wol.eng.wol |
6.7 |
0.143 |
Tatoeba - test.eng - xho.eng.xho |
26.4 |
0.620 |
Tatoeba - test.eng - yor.eng.yor |
15.5 |
0.342 |
Tatoeba - test.eng - zul.eng.zul |
35.9 |
0.750 |