đ eng-cel Translation Model
This project focuses on the translation from English to Celtic languages. It provides a transformer-based model with specific pre - processing steps and offers detailed test set scores.
đ Quick Start
The eng - cel model is designed for translating English text into Celtic languages. You can download the original weights and test set translations from the provided links.
⨠Features
- Language Scope: Translates from English to multiple Celtic languages including Breton (
bre
), Cornish (cor
), Welsh (cym
), Scottish Gaelic (gla
), Irish (gle
), and Manx (glv
).
- Pre - processing: Utilizes normalization and SentencePiece (spm32k, spm32k) for pre - processing.
- Language Token Requirement: A sentence initial language token in the form of
>>id<<
(where id
is a valid target language ID) is required.
đĻ Installation
To use this model, you need to download the original weights:
opus2m - 2020 - 08 - 01.zip
đ Documentation
Model Information
Benchmarks
testset |
BLEU |
chr - F |
Tatoeba - test.eng - bre.eng.bre |
11.5 |
0.338 |
Tatoeba - test.eng - cor.eng.cor |
0.3 |
0.095 |
Tatoeba - test.eng - cym.eng.cym |
31.0 |
0.549 |
Tatoeba - test.eng - gla.eng.gla |
7.6 |
0.317 |
Tatoeba - test.eng - gle.eng.gle |
35.9 |
0.582 |
Tatoeba - test.eng - glv.eng.glv |
9.9 |
0.454 |
Tatoeba - test.eng.multi |
18.0 |
0.342 |
System Info
- hf_name: eng - cel
- source_languages: eng
- target_languages: cel
- opus_readme_url: https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-cel/README.md
- original_repo: Tatoeba - Challenge
- tags: ['translation']
- languages: ['en', 'gd', 'ga', 'br', 'kw', 'gv', 'cy', 'cel']
- src_constituents: {'eng'}
- tgt_constituents: {'gla', 'gle', 'bre', 'cor', 'glv', 'cym'}
- src_multilingual: False
- tgt_multilingual: True
- prepro: normalization + SentencePiece (spm32k, spm32k)
- url_model: https://object.pouta.csc.fi/Tatoeba-MT-models/eng-cel/opus2m-2020-08-01.zip
- url_test_set: https://object.pouta.csc.fi/Tatoeba-MT-models/eng-cel/opus2m-2020-08-01.test.txt
- src_alpha3: eng
- tgt_alpha3: cel
- short_pair: en - cel
- chrF2_score: 0.342
- bleu: 18.0
- brevity_penalty: 0.9590000000000001
- ref_len: 45370.0
- src_name: English
- tgt_name: Celtic languages
- train_date: 2020 - 08 - 01
- src_alpha2: en
- tgt_alpha2: cel
- prefer_old: False
- long_pair: eng - cel
- helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
- transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b
- port_machine: brutasse
- port_time: 2020 - 08 - 21 - 14:41
đ License
This project is licensed under the Apache - 2.0 license.