opus-mt-tc-big-it-en Open-Source Translation Model - Achieve Precise Italian-to-English Translation for Free

Opus Mt Tc Big It En

Developed by Helsinki-NLP

This is a neural machine translation model for Italian-to-English translation, part of the OPUS-MT project, using the transformer-big architecture.

Machine Translation

Transformers

Supports Multiple Languages#Italian-English translation #High BLEU score #Multi-domain applicability

Downloads 175

Release Time : 4/13/2022

Model Overview

This model is specifically designed for translating Italian text into English, trained on the transformer-big architecture, supporting high-quality machine translation tasks.

Model Features

High-performance translation

Performs exceptionally well on multiple test sets, such as achieving a BLEU score of 72.1 on tatoeba-test-v2021-08-07.

Multi-dataset training

Training data comes from the OPUS project, including datasets like opusTCv20210807+bt, ensuring broad coverage of texts from various domains.

Open-source license

Uses the cc-by-4.0 license, allowing free use and sharing.

Model Capabilities

Italian-to-English text translation

Supports translation across multiple text domains

Use Cases

Text translation

Daily conversation translation

Translates Italian daily conversations into English.

Achieved a BLEU score of 72.1 on the tatoeba-test-v2021-08-07 dataset.

News translation

Translates Italian news content into English.

Achieved a BLEU score of 34.3 on the newstest2009 dataset.

🚀 opus-mt-tc-big-it-en

A neural machine translation model designed to translate text from Italian (it) to English (en).

This model is an integral part of the OPUS-MT project. The OPUS-MT project aims to make neural machine translation models accessible for numerous languages worldwide. All models are initially trained using the Marian NMT framework, an efficient NMT implementation written in pure C++. These models are then converted to pyTorch using the transformers library by huggingface. The training data is sourced from OPUS, and the training pipelines follow the procedures of OPUS-MT-train.

Publications: OPUS-MT – Building open translation services for the World and The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT (Please cite these publications if you use this model.)

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

📚 Documentation

Model Info

Property	Details
Release	2022-02-25
Source Language(s)	ita
Target Language(s)	eng
Model Type	transformer-big
Training Data	opusTCv20210807+bt (source)
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	opusTCv20210807+bt_transformer-big_2022-02-25.zip
More Info on Released Models	OPUS-MT ita-eng README

Usage

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "So chi è il mio nemico.",
    "Tom è illetterato; non capisce assolutamente nulla."
]

model_name = "pytorch-models/opus-mt-tc-big-it-en"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     I know who my enemy is.
#     Tom is illiterate; he understands absolutely nothing.

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-it-en")
print(pipe("So chi è il mio nemico."))

# expected output: I know who my enemy is.

Benchmarks

Test set translations: opusTCv20210807+bt_transformer-big_2022-02-25.test.txt
Test set scores: opusTCv20210807+bt_transformer-big_2022-02-25.eval.txt
Benchmark results: benchmark_results.txt
Benchmark output: benchmark_translations.zip

langpair	testset	chr-F	BLEU	#sent	#words
ita-eng	tatoeba-test-v2021-08-07	0.82288	72.1	17320	119214
ita-eng	flores101-devtest	0.62115	32.8	1012	24721
ita-eng	newssyscomb2009	0.59822	34.4	502	11818
ita-eng	newstest2009	0.59646	34.3	2525	65399

Acknowledgements

The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Model Conversion Info

Property	Details
Transformers Version	4.16.2
OPUS-MT Git Hash	3405783
Port Time	Wed Apr 13 19:40:08 EEST 2022
Port Machine	LM0-400-22516.local

📄 License

This model is licensed under cc-by-4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご