Opus Mt Mul En
This is a Transformer-based multilingual-to-English machine translation model supporting over 100 languages.
Downloads 173.61k
Release Time : 3/2/2022
Model Overview
This model specializes in translating multiple languages into English, employing standardized preprocessing and SentencePiece tokenization techniques, suitable for large-scale multilingual translation scenarios.
Model Features
Extensive language support
Supports translation tasks for over 100 languages, covering major global language families and regional languages.
Standardized preprocessing
Employs standardized preprocessing to ensure input text consistency and translation quality.
SentencePiece tokenization
Uses spm32k SentencePiece tokenization to effectively process multilingual texts.
Model Capabilities
Multilingual text translation
Large-scale parallel translation
Cross-lingual information conversion
Use Cases
Cross-language communication
Multilingual content localization
Translate non-English content into English for global dissemination.
Academic research
Multilingual literature translation
Assist researchers in obtaining English versions of non-English academic materials.
🚀 Multi-language to English Translation Model
This project focuses on multi - language to English translation. It provides a transformer - based model that supports a wide range of source languages for translation into English.
✨ Features
- Wide Language Support: Supports a vast number of source languages, including but not limited to Arabic, Chinese, French, German, Spanish, etc., enabling multi - language translation into English.
- Transformer Model: Utilizes the transformer architecture, which is known for its excellent performance in natural language processing tasks.
- Pre - processing: Applies normalization and SentencePiece for pre - processing to improve translation quality.
📦 Installation
No installation steps are provided in the original README.
💻 Usage Examples
No code examples are provided in the original README.
📚 Documentation
Language Information
- Supported Languages: The model supports a large number of languages, including ca, es, os, eo, ro, fy, cy, is, lb, su, an, sq, fr, ht, rm, cv, ig, am, eu, tr, ps, af, ny, ch, uk, sl, lt, tk, sg, ar, lg, bg, be, ka, gd, ja, si, br, mh, km, th, ty, rw, te, mk, or, wo, kl, mr, ru, yo, hu, fo, zh, ti, co, ee, oc, sn, mt, ts, pl, gl, nb, bn, tt, bo, lo, id, gn, nv, hy, kn, to, io, so, vi, da, fj, gv, sm, nl, mi, pt, hi, se, as, ta, et, kw, ga, sv, ln, na, mn, gu, wa, lv, jv, el, my, ba, it, hr, ur, ce, nn, fi, mg, rn, xh, ab, de, cs, he, zu, yi, ml, mul, en.
- Source and Target Languages for the Model:
- Source Languages: abk acm ady afb afh_Latn afr akl_Latn aln amh ang_Latn apc ara arg arq ary arz asm ast avk_Latn awa aze_Latn bak bam_Latn bel bel_Latn ben bho bod bos_Latn bre brx brx_Latn bul bul_Latn cat ceb ces cha che chr chv cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant cor cos crh crh_Latn csb_Latn cym dan deu dsb dtp dws_Latn egl ell enm_Latn epo est eus ewe ext fao fij fin fkv_Latn fra frm_Latn frr fry fuc fuv gan gcf_Latn gil gla gle glg glv gom gos got_Goth grc_Grek grn gsw guj hat hau_Latn haw heb hif_Latn hil hin hnj_Latn hoc hoc_Latn hrv hsb hun hye iba ibo ido ido_Latn ike_Latn ile_Latn ilo ina_Latn ind isl ita izh jav jav_Java jbo jbo_Cyrl jbo_Latn jdt_Cyrl jpn kab kal kan kat kaz_Cyrl kaz_Latn kek_Latn kha khm khm_Latn kin kir_Cyrl kjh kpv krl ksh kum kur_Arab kur_Latn lad lad_Latn lao lat_Latn lav ldn_Latn lfn_Cyrl lfn_Latn lij lin lit liv_Latn lkt lld_Latn lmo ltg ltz lug lzh lzh_Hans mad mah mai mal mar max_Latn mdf mfe mhr mic min mkd mlg mlt mnw moh mon mri mwl mww mya myv nan nau nav nds niu nld nno nob nob_Hebr nog non_Latn nov_Latn npi nya oci ori orv_Cyrl oss ota_Arab ota_Latn pag pan_Guru pap pau pdc pes pes_Latn pes_Thaa pms pnb pol por ppl_Latn prg_Latn pus quc qya qya_Latn rap rif_Latn roh rom ron rue run rus sag sah san_Deva scn sco sgs shs_Latn shy_Latn sin sjn_Latn slv sma sme smo sna snd_Arab som spa sqi srp_Cyrl srp_Latn stq sun swe swg swh tah tam tat tat_Arab tat_Latn tel tet tgk_Cyrl tha tir tlh_Latn tly_Latn tmw_Latn toi_Latn ton tpw_Latn tso tuk tuk_Latn tur tvl tyv tzl tzl_Latn udm uig_Arab uig_Cyrl ukr umb urd uzb_Cyrl uzb_Latn vec vie vie_Hani vol_Latn vro war wln wol wuu xal xho yid yor yue yue_Hans yue_Hant zho zho_Hans zho_Hant zlm_Latn zsm_Latn zul zza
- Target Language: eng
Model Information
Property | Details |
---|---|
Model Type | Transformer |
Training Data | Not provided in the original README |
Pre - processing | Normalization + SentencePiece (spm32k, spm32k) |
Download Original Weights | [opus2m - 2020 - 08 - 01.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/mul - eng/opus2m - 2020 - 08 - 01.zip) |
Test Set Translations | [opus2m - 2020 - 08 - 01.test.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/mul - eng/opus2m - 2020 - 08 - 01.test.txt) |
Test Set Scores | [opus2m - 2020 - 08 - 01.eval.txt](https://object.pouta.csc.fi/Tatoeba - MT - models/mul - eng/opus2m - 2020 - 08 - 01.eval.txt) |
Benchmarks
Testset | BLEU | chr - F |
---|---|---|
newsdev2014 - hineng.hin.eng | 8.5 | 0.341 |
newsdev2015 - enfi - fineng.fin.eng | 16.8 | 0.441 |
newsdev2016 - enro - roneng.ron.eng | 31.3 | 0.580 |
newsdev2016 - entr - tureng.tur.eng | 16.4 | 0.422 |
newsdev2017 - enlv - laveng.lav.eng | 21.3 | 0.502 |
newsdev2017 - enzh - zhoeng.zho.eng | 12.7 | 0.409 |
newsdev2018 - enet - esteng.est.eng | 19.8 | 0.467 |
newsdev2019 - engu - gujeng.guj.eng | 13.3 | 0.385 |
newsdev2019 - enlt - liteng.lit.eng | 19.9 | 0.482 |
newsdiscussdev2015 - enfr - fraeng.fra.eng | 26.7 | 0.520 |
newsdiscusstest2015 - enfr - fraeng.fra.eng | 29.8 | 0.541 |
newssyscomb2009 - ceseng.ces.eng | 21.1 | 0.487 |
newssyscomb2009 - deueng.deu.eng | 22.6 | 0.499 |
newssyscomb2009 - fraeng.fra.eng | 25.8 | 0.530 |
newssyscomb2009 - huneng.hun.eng | 15.1 | 0.430 |
newssyscomb2009 - itaeng.ita.eng | 29.4 | 0.555 |
newssyscomb2009 - spaeng.spa.eng | 26.1 | 0.534 |
news - test2008 - deueng.deu.eng | 21.6 | 0.491 |
news - test2008 - fraeng.fra.eng | 22.3 | 0.502 |
news - test2008 - spaeng.spa.eng | 23.6 | 0.514 |
newstest2009 - ceseng.ces.eng | 19.8 | 0.480 |
newstest2009 - deueng.deu.eng | 20.9 | 0.487 |
newstest2009 - fraeng.fra.eng | 25.0 | 0.523 |
newstest2009 - huneng.hun.eng | 14.7 | 0.425 |
newstest2009 - itaeng.ita.eng | 27.6 | 0.542 |
newstest2009 - spaeng.spa.eng | 25.7 | 0.530 |
newstest2010 - ceseng.ces.eng | 20.6 | 0.491 |
newstest2010 - deueng.deu.eng | 23.4 | 0.517 |
newstest2010 - fraeng.fra.eng | 26.1 | 0.537 |
newstest2010 - spaeng.spa.eng | 29.1 | 0.561 |
newstest2011 - ceseng.ces.eng | 21.0 | 0.489 |
newstest2011 - deueng.deu.eng | 21.3 | 0.494 |
newstest2011 - fraeng.fra.eng | 26.8 | 0.546 |
newstest2011 - spaeng.spa.eng | 28.2 | 0.549 |
newstest2012 - ceseng.ces.eng | 20.5 | 0.485 |
newstest2012 - deueng.deu.eng | 22.3 | 0.503 |
newstest2012 - fraeng.fra.eng | 27.5 | 0.545 |
newstest2012 - ruseng.rus.eng | 26.6 | 0.532 |
newstest2012 - spaeng.spa.eng | 30.3 | 0.567 |
newstest2013 - ceseng.ces.eng | 22.5 | 0.498 |
newstest2013 - deueng.deu.eng | 25.0 | 0.518 |
newstest2013 - fraeng.fra.eng | 27.4 | 0.537 |
newstest2013 - ruseng.rus.eng | 21.6 | 0.484 |
newstest2013 - spaeng.spa.eng | 28.4 | 0.555 |
newstest2014 - csen - ceseng.ces.eng | 24.0 | 0.517 |
newstest2014 - deen - deueng.deu.eng | 24.1 | 0.511 |
newstest2014 - fren - fraeng.fra.eng | 29.1 | 0.563 |
newstest2014 - hien - hineng.hin.eng | 14.0 | 0.414 |
newstest2014 - ruen - ruseng.rus.eng | 24.0 | 0.521 |
newstest2015 - encs - ceseng.ces.eng | 21.9 | 0.481 |
newstest2015 - ende - deueng.deu.eng | 25.5 | 0.519 |
newstest2015 - enfi - fineng.fin.eng | 17.4 | 0.441 |
newstest2015 - enru - ruseng.rus.eng | 22.4 | 0.494 |
newstest2016 - encs - ceseng.ces.eng | 23.0 | 0.500 |
newstest2016 - ende - deueng.deu.eng | 30.1 | 0.560 |
newstest2016 - enfi - fineng.fin.eng | 18.5 | 0.461 |
newstest2016 - enro - roneng.ron.eng | 29.6 | 0.562 |
newstest2016 - enru - ruseng.rus.eng | 22.0 | 0.495 |
newstest2016 - entr - tureng.tur.eng | 14.8 | 0.415 |
newstest2017 - encs - ceseng.ces.eng | 20.2 | 0.475 |
newstest2017 - ende - deueng.deu.eng | 26.0 | 0.523 |
newstest2017 - enfi - fineng.fin.eng | 19.6 | 0.465 |
newstest2017 - enlv - laveng.lav.eng | 16.2 | 0.454 |
newstest2017 - enru - ruseng.rus.eng | 24.2 | 0.510 |
newstest2017 - entr - tureng.tur.eng | 15.0 | 0.412 |
newstest2017 - enzh - zhoeng.zho.eng | 13.7 | 0.412 |
newstest2018 - encs - ceseng.ces.eng | 21.2 | 0.486 |
newstest2018 - ende - deueng.deu.eng | 31.5 | 0.564 |
newstest2018 - enet - esteng.est.eng | 19.7 | 0.473 |
newstest2018 - enfi - fineng.fin.eng | 15.1 | 0.418 |
newstest2018 - enru - ruseng.rus.eng | 21.3 | 0.490 |
newstest2018 - entr - tureng.tur.eng | 15.4 | 0.421 |
newstest2018 - enzh - zhoeng.zho.eng | 12.9 | 0.408 |
newstest2019 - deen - deueng.deu.eng | 27.0 | 0.529 |
newstest2019 - fien - fineng.fin.eng | 17.2 | 0.438 |
newstest2019 - guen - gujeng.guj.eng | 9.0 | 0.342 |
newstest2019 - lten - liteng.lit.eng | 22.6 | 0.512 |
newstest2019 - ruen - ruseng.rus.eng | 24.1 | 0.503 |
newstest2019 - zhen - zhoeng.zho.eng | 13.9 | 0.427 |
newstestB2016 - enfi - fineng.fin.eng | 15.2 | 0.428 |
newstestB2017 - enfi - fineng.fin.eng | 16.8 | 0.442 |
newstestB2017 - fien - fineng.fin.eng | 16.8 | 0.442 |
Tatoeba - test.abk - eng.abk.eng | 2.4 | 0.190 |
Tatoeba - test.ady - eng.ady.eng | 1.1 | 0.111 |
Tatoeba - test.afh - eng.afh.eng | 1.7 | 0.108 |
Tatoeba - test.afr - eng.afr.eng | 53.0 | 0.672 |
Tatoeba - test.akl - eng.akl.eng | 5.9 | 0.239 |
Tatoeba - test.amh - eng.amh.eng | 25.6 | 0.464 |
Tatoeba - test.ang - eng.ang.eng | 11.7 | 0.289 |
Tatoeba - test.ara - eng.ara.eng | 26.4 | 0.443 |
Tatoeba - test.arg - eng.arg.eng | 35.9 | 0.473 |
Tatoeba - test.asm - eng.asm.eng | 19.8 | 0.365 |
Tatoeba - test.ast - eng.ast.eng | 31.8 | 0.467 |
Tatoeba - test.avk - eng.avk.eng | 0.4 | 0.119 |
Tatoeba - test.awa - eng.awa.eng | 9.7 | 0.271 |
Tatoeba - test.aze - eng.aze.eng | 37.0 | 0.542 |
Tatoeba - test.bak - eng.bak.eng | 13.9 | 0.395 |
Tatoeba - test.bam - eng.bam.eng | 2.2 | 0.094 |
Tatoeba - test.bel - eng.bel.eng | 36.8 | 0.549 |
Tatoeba - test.ben - eng.ben.eng | 39.7 | 0.546 |
Tatoeba - test.bho - eng.bho.eng | 33.6 | 0.540 |
Tatoeba - test.bod - eng.bod.eng | 1.1 | 0.147 |
Tatoeba - test.bre - eng.bre.eng | 14.2 | 0.303 |
Tatoeba - test.brx - eng.brx.eng | 1.7 | 0.130 |
Tatoeba - test.bul - eng.bul.eng | 46.0 | 0.621 |
Tatoeba - test.cat - eng.cat.eng | 46.6 | 0.636 |
Tatoeba - test.ceb - eng.ceb.eng | 17.4 | 0.347 |
Tatoeba - test.ces - eng.ces.eng | 41.3 | 0.586 |
Tatoeba - test.cha - eng.cha.eng | 7.9 | 0.232 |
Tatoeba - test.che - eng.che.eng | 0.7 | 0.104 |
Tatoeba - test.chm - eng.chm.eng | 7.3 | 0.261 |
Tatoeba - test.chr - eng.chr.eng | 8.8 | 0.244 |
Tatoeba - test.chv - eng.chv.eng | 11.0 | 0.319 |
Tatoeba - test.cor - eng.cor.eng | 5.4 | 0.204 |
Tatoeba - test.cos - eng.cos.eng | 58.2 | 0.643 |
Tatoeba - test.crh - eng.crh.eng | 26.3 | 0.399 |
Tatoeba - test.csb - eng.csb.eng | 18.8 | 0.389 |
Tatoeba - test.cym - eng.cym.eng | 23.4 | 0.407 |
Tatoeba - test.dan - eng.dan.eng | 50.5 | 0.659 |
Tatoeba - test.deu - eng.deu.eng | 39.6 | 0.579 |
Tatoeba - test.dsb - eng.dsb.eng | 24.3 | 0.449 |
Tatoeba - test.dtp - eng.dtp.eng | 1.0 | 0.149 |
Tatoeba - test.dws - eng.dws.eng | 1.6 | 0.061 |
Tatoeba - test.egl - eng.egl.eng | 7.6 | 0.236 |
Tatoeba - test.ell - eng.ell.eng | 55.4 | 0.682 |
Tatoeba - test.enm - eng.enm.eng | 28.0 | 0.489 |
Tatoeba - test.epo - eng.epo.eng | 41.8 | 0.591 |
Tatoeba - test.est - eng.est.eng | 41.5 | 0.581 |
Tatoeba - test.eus - eng.eus.eng | 37.8 | 0.557 |
Tatoeba - test.ewe - eng.ewe.eng | 10.7 | 0.262 |
Tatoeba - test.ext - eng.ext.eng | 25.5 | 0.405 |
Tatoeba - test.fao - eng.fao.eng | 28.7 | 0.469 |
Tatoeba - test.fas - eng.fas.eng | 7.5 | 0.281 |
Tatoeba - test.fij - eng.fij.eng | 24.2 | 0.320 |
Tatoeba - test.fin - eng.fin.eng | 35.8 | 0.534 |
Tatoeba - test.fkv - eng.fkv.eng | 15.5 | 0.434 |
Tatoeba - test.fra - eng.fra.eng | 45.1 | 0.618 |
Tatoeba - test.frm - eng.frm.eng | 29.6 | 0.427 |
Tatoeba - test.frr - eng.frr.eng | 5.5 | 0.138 |
Tatoeba - test.fry - eng.fry.eng | 25.3 | 0.455 |
Tatoeba - test.ful - eng.ful.eng | 1.1 | 0.127 |
Tatoeba - test.gcf - eng.gcf.eng | 16.0 | 0.315 |
Tatoeba - test.gil - eng.gil.eng | 46.7 | 0.587 |
Tatoeba - test.gla - eng.gla.eng | 20.2 | 0.358 |
Tatoeba - test.gle - eng.gle.eng | 43.9 | 0.592 |
Tatoeba - test.glg - eng.glg.eng | 45.1 | 0.623 |
Tatoeba - test.glv - eng.glv.eng | 3.3 | 0.119 |
Tatoeba - test.gos - eng.gos.eng | 20.1 | 0.364 |
Tatoeba - test.got - eng.got.eng | 0.1 | 0.041 |
Tatoeba - test.grc - eng.grc.eng | 2.1 | 0.137 |
Tatoeba - test.grn - eng.grn.eng | 1.7 | 0.152 |
Tatoeba - test.gsw - eng.gsw.eng | 18.2 | 0.334 |
Tatoeba - test.guj - eng.guj.eng | 21.7 | 0.373 |
Tatoeba - test.hat - eng.hat.eng | 34.5 | 0.502 |
Tatoeba - test.hau - eng.hau.eng | 10.5 | 0.295 |
Tatoeba - test.haw - eng.haw.eng | 2.8 | 0.160 |
Tatoeba - test.hbs - eng.hbs.eng | 46.7 | 0.623 |
Tatoeba - test.heb - eng.heb.eng | 33.0 | 0.492 |
Tatoeba - test.hif - eng.hif.eng | 17.0 | 0.391 |
Tatoeba - test.hil - eng.hil.eng | 16.0 | 0.339 |
Tatoeba - test.hin - eng.hin.eng | 36.4 | 0.533 |
Tatoeba - test.hmn - eng.hmn.eng | 0.4 | 0.131 |
Tatoeba - test.hoc - eng.hoc.eng | 0.7 | 0.132 |
Tatoeba - test.hsb - eng.hsb.eng | 41.9 | 0.551 |
Tatoeba - test.hun - eng.hun.eng | 33.2 | 0.510 |
Tatoeba - test.hye - eng.hye.eng | 32.2 | 0.487 |
Tatoeba - test.iba - eng.iba.eng | 9.4 | 0.278 |
Tatoeba - test.ibo - eng.ibo.eng | 5.8 | 0.200 |
Tatoeba - test.ido - eng.ido.eng | 31.7 | 0.503 |
Tatoeba - test.iku - eng.iku.eng | 9.1 | 0.164 |
Tatoeba - test.ile - eng.ile.eng | 42.2 | 0.595 |
Tatoeba - test.ilo - eng.ilo.eng | 29.7 | 0.485 |
Tatoeba - test.ina - eng.ina.eng | 42.1 | 0.607 |
Tatoeba - test.isl - eng.isl.eng | 35.7 | 0.527 |
Tatoeba - test.ita - eng.ita.eng | 54.8 | 0.686 |
Tatoeba - test.izh - eng.izh.eng | 28.3 | 0.526 |
Tatoeba - test.jav - eng.jav.eng | 10.0 | 0.282 |
Tatoeba - test.jbo - eng.jbo.eng | 0.3 | 0.115 |
Tatoeba - test.jdt - eng.jdt.eng | 5.3 | 0.140 |
Tatoeba - test.jpn - eng.jpn.eng | 18.8 | 0.387 |
Tatoeba - test.kab - eng.kab.eng | 3.9 | 0.205 |
Tatoeba - test.kal - eng.kal.eng | 16.9 | 0.329 |
Tatoeba - test.kan - eng.kan.eng | 16.2 | 0.374 |
Tatoeba - test.kat - eng.kat.eng | 31.1 | 0.493 |
Tatoeba - test.kaz - eng.kaz.eng | 24.5 | 0.437 |
Tatoeba - test.kek - eng.kek.eng | 7.4 | 0.192 |
Tatoeba - test.kha - eng.kha.eng | 1.0 | 0.154 |
Tatoeba - test.khm - eng.khm.eng | 12.2 | 0.290 |
Tatoeba - test.kin - eng.kin.eng | 22.5 | 0.355 |
Tatoeba - test.kir - eng.kir.eng | 27.2 | 0.470 |
Tatoeba - test.kjh - eng.kjh.eng | 2.1 | 0.129 |
Tatoeba - test.kok - eng.kok.eng | 4.5 | 0.259 |
Tatoeba - test.kom - eng.kom.eng | 1.4 | 0.09 |
License
The project is licensed under the Apache - 2.0 license.
M2m100 418M
MIT
M2M100 is a multilingual encoder-decoder model supporting 9,900 translation directions across 100 languages
Machine Translation Supports Multiple Languages
M
facebook
1.6M
299
Opus Mt Fr En
Apache-2.0
A Transformer-based French-to-English neural machine translation model developed by the Helsinki-NLP team, trained on the OPUS multilingual dataset.
Machine Translation Supports Multiple Languages
O
Helsinki-NLP
1.2M
44
Opus Mt Ar En
Apache-2.0
Arabic-to-English machine translation model trained on OPUS data, using transformer-align architecture
Machine Translation Supports Multiple Languages
O
Helsinki-NLP
579.41k
42
M2m100 1.2B
MIT
M2M100 is a multilingual machine translation model supporting 100 languages, capable of direct translation across 9900 translation directions.
Machine Translation Supports Multiple Languages
M
facebook
501.82k
167
Indictrans2 Indic En 1B
MIT
A 1.1B-parameter machine translation model supporting mutual translation between 25 Indian languages and English, developed by the AI4Bharat project
Machine Translation
Transformers Supports Multiple Languages

I
ai4bharat
473.63k
14
Opus Mt En Zh
Apache-2.0
A Transformer-based English-to-Multidialectal Chinese translation model supporting translation tasks from English to 13 Chinese variants
Machine Translation Supports Multiple Languages
O
Helsinki-NLP
442.08k
367
Opus Mt Zh En
A Chinese-to-English machine translation model developed by the University of Helsinki, based on the OPUS corpus
Machine Translation Supports Multiple Languages
O
Helsinki-NLP
441.24k
505
Mbart Large 50 Many To Many Mmt
A multilingual machine translation model fine-tuned based on mBART-large-50, supporting translation between 50 languages
Machine Translation Supports Multiple Languages
M
facebook
404.66k
357
Opus Mt De En
Apache-2.0
opus-mt-de-en is a German-to-English machine translation model based on the transformer-align architecture, developed by the Helsinki-NLP team.
Machine Translation Supports Multiple Languages
O
Helsinki-NLP
404.33k
44
Opus Mt Es En
Apache-2.0
This is a machine translation model from Spanish to English based on the Transformer architecture, developed by the Helsinki-NLP team.
Machine Translation
Transformers Supports Multiple Languages

O
Helsinki-NLP
385.40k
71
Featured Recommended AI Models