🚀 Bantu Languages to English Translation Model
This project focuses on a translation model that translates from Bantu languages to English. It provides details about the model, its performance, and relevant system information.
✨ Features
- Multilingual Support: Supports multiple Bantu languages including
sn
, zu
, rw
, lg
, ts
, ln
, ny
, xh
, rn
, bnt
for translation to English.
- Transformer Model: Utilizes a transformer model for translation tasks.
- Pre - processing: Applies normalization and SentencePiece (spm32k, spm32k) for pre - processing.
📦 Installation
No specific installation steps are provided in the original document.
📚 Documentation
Bantu to English Translation Details
-
Source Group: Bantu languages
-
Target Group: English
-
OPUS Readme: [bnt - eng](https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/bnt - eng/README.md)
-
Model: transformer
-
Source Languages: kin
, lin
, lug
, nya
, run
, sna
, swh
, toi_Latn
, tso
, umb
, xho
, zul
-
Target Language: eng
-
Pre - processing: normalization + SentencePiece (spm32k, spm32k)
-
Download Original Weights: [opus2m - 2020 - 07 - 31.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/bnt - eng/opus2m - 2020 - 07 - 31.zip)
-
Test Set Translations: [opus2m - 2020 - 07 - 31.test.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/bnt - eng/opus2m - 2020 - 07 - 31.test.txt)
-
Test Set Scores: [opus2m - 2020 - 07 - 31.eval.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/bnt - eng/opus2m - 2020 - 07 - 31.eval.txt)
Benchmarks
Testset |
BLEU |
chr - F |
Tatoeba - test.kin - eng.kin.eng |
31.7 |
0.481 |
Tatoeba - test.lin - eng.lin.eng |
8.3 |
0.271 |
Tatoeba - test.lug - eng.lug.eng |
5.3 |
0.128 |
Tatoeba - test.multi.eng |
23.1 |
0.394 |
Tatoeba - test.nya - eng.nya.eng |
38.3 |
0.527 |
Tatoeba - test.run - eng.run.eng |
26.6 |
0.431 |
Tatoeba - test.sna - eng.sna.eng |
27.5 |
0.440 |
Tatoeba - test.swa - eng.swa.eng |
4.6 |
0.195 |
Tatoeba - test.toi - eng.toi.eng |
16.2 |
0.342 |
Tatoeba - test.tso - eng.tso.eng |
100.0 |
1.000 |
Tatoeba - test.umb - eng.umb.eng |
8.4 |
0.231 |
Tatoeba - test.xho - eng.xho.eng |
37.2 |
0.554 |
Tatoeba - test.zul - eng.zul.eng |
40.9 |
0.576 |
System Info
Property |
Details |
hf_name |
bnt - eng |
Source Languages |
bnt |
Target Languages |
eng |
Opus Readme URL |
[https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/bnt - eng/README.md](https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/bnt - eng/README.md) |
Original Repo |
Tatoeba - Challenge |
Tags |
['translation'] |
Languages |
['sn', 'zu', 'rw', 'lg', 'ts', 'ln', 'ny', 'xh', 'rn', 'bnt', 'en'] |
Source Constituents |
{'sna', 'zul', 'kin', 'lug', 'tso', 'lin', 'nya', 'xho', 'swh', 'run', 'toi_Latn', 'umb'} |
Target Constituents |
{'eng'} |
Source Multilingual |
True |
Target Multilingual |
False |
Pre - processing |
normalization + SentencePiece (spm32k, spm32k) |
URL Model |
[https://object.pouta.csc.fi/Tatoeba - MT - models/bnt - eng/opus2m - 2020 - 07 - 31.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/bnt - eng/opus2m - 2020 - 07 - 31.zip) |
URL Test Set |
[https://object.pouta.csc.fi/Tatoeba - MT - models/bnt - eng/opus2m - 2020 - 07 - 31.test.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/bnt - eng/opus2m - 2020 - 07 - 31.test.txt) |
Source Alpha3 |
bnt |
Target Alpha3 |
eng |
Short Pair |
bnt - en |
chrF2 Score |
0.39399999999999996 |
BLEU |
23.1 |
Brevity Penalty |
1.0 |
Ref Len |
14565.0 |
Source Name |
Bantu languages |
Target Name |
English |
Train Date |
2020 - 07 - 31 |
Source Alpha2 |
bnt |
Target Alpha2 |
en |
Prefer Old |
False |
Long Pair |
bnt - eng |
Helsinki Git SHA |
480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 |
Transformers Git SHA |
2207e5d8cb224e954a7cba69fa4ac2309e9ff30b |
Port Machine |
brutasse |
Port Time |
2020 - 08 - 21 - 14:41 |
📄 License
This project is licensed under the Apache - 2.0 license.