đ English to Bantu Languages Translation Model
This project focuses on translating English to various Bantu languages. It provides a transformer - based model with specific pre - processing steps and offers detailed benchmarks.
đ Quick Start
The model is designed for translating English to Bantu languages. To use it, you need to follow the pre - processing steps and use the appropriate language tokens.
⨠Features
- Multi - target Translation: Capable of translating English into multiple Bantu languages, including kin, lin, lug, nya, run, sna, swh, toi_Latn, tso, umb, xho, zul.
- Pre - processing: Utilizes normalization and SentencePiece (spm32k, spm32k) for pre - processing.
- Language Token Requirement: A sentence initial language token in the form of
>>id<<
(id = valid target language ID) is required.
đĻ Installation
There is no specific installation command provided in the original document, so this section is skipped.
đģ Usage Examples
There is no code example provided in the original document, so this section is skipped.
đ Documentation
eng - bnt
- Source Group: English
- Target Group: Bantu languages
- OPUS Readme: eng - bnt
- Model: Transformer
- Source Language(s): eng
- Target Language(s): kin lin lug nya run sna swh toi_Latn tso umb xho zul
- Pre - processing: normalization + SentencePiece (spm32k, spm32k)
- Language Token Requirement: A sentence initial language token in the form of
>>id<<
(id = valid target language ID) is required.
- Download Original Weights: opus - 2020 - 07 - 26.zip
- Test Set Translations: opus - 2020 - 07 - 26.test.txt
- Test Set Scores: opus - 2020 - 07 - 26.eval.txt
Benchmarks
Testset |
BLEU |
chr - F |
Tatoeba - test.eng - kin.eng.kin |
12.5 |
0.519 |
Tatoeba - test.eng - lin.eng.lin |
1.1 |
0.277 |
Tatoeba - test.eng - lug.eng.lug |
4.8 |
0.415 |
Tatoeba - test.eng.multi |
12.1 |
0.449 |
Tatoeba - test.eng - nya.eng.nya |
22.1 |
0.616 |
Tatoeba - test.eng - run.eng.run |
13.2 |
0.492 |
Tatoeba - test.eng - sna.eng.sna |
32.1 |
0.669 |
Tatoeba - test.eng - swa.eng.swa |
1.7 |
0.180 |
Tatoeba - test.eng - toi.eng.toi |
10.7 |
0.266 |
Tatoeba - test.eng - tso.eng.tso |
26.9 |
0.631 |
Tatoeba - test.eng - umb.eng.umb |
5.2 |
0.295 |
Tatoeba - test.eng - xho.eng.xho |
22.6 |
0.615 |
Tatoeba - test.eng - zul.eng.zul |
41.1 |
0.769 |
System Info
Property |
Details |
hf_name |
eng - bnt |
source_languages |
eng |
target_languages |
bnt |
opus_readme_url |
https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-bnt/README.md |
original_repo |
Tatoeba - Challenge |
tags |
['translation'] |
languages |
['en', 'sn', 'zu', 'rw', 'lg', 'ts', 'ln', 'ny', 'xh', 'rn', 'bnt'] |
src_constituents |
{'eng'} |
tgt_constituents |
{'sna', 'zul', 'kin', 'lug', 'tso', 'lin', 'nya', 'xho', 'swh', 'run', 'toi_Latn', 'umb'} |
src_multilingual |
False |
tgt_multilingual |
True |
prepro |
normalization + SentencePiece (spm32k, spm32k) |
url_model |
https://object.pouta.csc.fi/Tatoeba-MT-models/eng-bnt/opus-2020-07-26.zip |
url_test_set |
https://object.pouta.csc.fi/Tatoeba-MT-models/eng-bnt/opus-2020-07-26.test.txt |
src_alpha3 |
eng |
tgt_alpha3 |
bnt |
short_pair |
en - bnt |
chrF2_score |
0.449 |
bleu |
12.1 |
brevity_penalty |
1.0 |
ref_len |
9989.0 |
src_name |
English |
tgt_name |
Bantu languages |
train_date |
2020 - 07 - 26 |
src_alpha2 |
en |
tgt_alpha2 |
bnt |
prefer_old |
False |
long_pair |
eng - bnt |
helsinki_git_sha |
480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 |
transformers_git_sha |
2207e5d8cb224e954a7cba69fa4ac2309e9ff30b |
port_machine |
brutasse |
port_time |
2020 - 08 - 21 - 14:41 |
đ License
The project is licensed under the Apache - 2.0 license.