🚀 msa-msa
This project focuses on Malay (macrolanguage) translation, offering a transformer - align model with specific pre - processing and evaluation results.
📚 Documentation
Project Overview
- Source Group: Malay (macrolanguage)
- Target Group: Malay (macrolanguage)
- OPUS Readme: [msa - msa](https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/msa - msa/README.md)
Model Details
- Model Type: transformer - align
- Source Languages: ind max_Latn min zlm_Latn zsm_Latn
- Target Languages: ind max_Latn min zlm_Latn zsm_Latn
- Pre - processing: normalization + SentencePiece (spm4k,spm4k)
- Language Token Requirement: A sentence initial language token is required in the form of
>>id<<
(id = valid target language ID)
Download Links
- Original Weights: [opus - 2020 - 06 - 17.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/msa - msa/opus - 2020 - 06 - 17.zip)
- Test Set Translations: [opus - 2020 - 06 - 17.test.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/msa - msa/opus - 2020 - 06 - 17.test.txt)
- Test Set Scores: [opus - 2020 - 06 - 17.eval.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/msa - msa/opus - 2020 - 06 - 17.eval.txt)
Benchmarks
Property |
Details |
Testset |
Tatoeba - test.msa.msa |
BLEU |
18.6 |
chr - F |
0.418 |
System Info
Property |
Details |
hf_name |
msa - msa |
source_languages |
msa |
target_languages |
msa |
opus_readme_url |
https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/msa - msa/README.md |
original_repo |
Tatoeba - Challenge |
tags |
['translation'] |
languages |
['ms'] |
src_constituents |
{'zsm_Latn', 'ind', 'max_Latn', 'zlm_Latn', 'min'} |
tgt_constituents |
{'zsm_Latn', 'ind', 'max_Latn', 'zlm_Latn', 'min'} |
src_multilingual |
False |
tgt_multilingual |
False |
prepro |
normalization + SentencePiece (spm4k,spm4k) |
url_model |
https://object.pouta.csc.fi/Tatoeba - MT - models/msa - msa/opus - 2020 - 06 - 17.zip |
url_test_set |
https://object.pouta.csc.fi/Tatoeba - MT - models/msa - msa/opus - 2020 - 06 - 17.test.txt |
src_alpha3 |
msa |
tgt_alpha3 |
msa |
short_pair |
ms - ms |
chrF2_score |
0.418 |
bleu |
18.6 |
brevity_penalty |
1.0 |
ref_len |
6029.0 |
src_name |
Malay (macrolanguage) |
tgt_name |
Malay (macrolanguage) |
train_date |
2020 - 06 - 17 |
src_alpha2 |
ms |
tgt_alpha2 |
ms |
prefer_old |
False |
long_pair |
msa - msa |
helsinki_git_sha |
480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535 |
transformers_git_sha |
2207e5d8cb224e954a7cba69fa4ac2309e9ff30b |
port_machine |
brutasse |
port_time |
2020 - 08 - 21 - 14:41 |
📄 License
This project is licensed under the Apache - 2.0 license.