opus-mt-tc-big-en-fr open-source translation model, enabling high-quality English-to-French translation for free

Opus Mt Tc Big En Fr

Developed by Helsinki-NLP

This is a neural machine translation model based on the Transformer architecture, specifically designed for translating English to French. It is part of the OPUS-MT project, aiming to provide extensive language coverage and easily accessible translation services.

Machine Translation

Transformers

Supports Multiple Languages#English-French Neural Machine Translation #Large Model Architecture #Multi-domain Adaptation

Downloads 27.11k

Release Time : 4/13/2022

Model Overview

The model is trained using the efficient Marian NMT framework, supporting high-quality translation from English to French, suitable for various text types and application scenarios.

Model Features

Efficient Translation

Based on the Transformer-big architecture, providing high-quality English to French translation.

Broad Coverage

Training data comes from the OPUS project, covering various text types and domains.

Easy to Use

Supports easy invocation and integration via Hugging Face's transformers library.

Model Capabilities

Text Translation

Multi-domain Translation

High-quality Translation

Use Cases

Education

Language Learning

Helps students or language learners quickly translate English texts into French.

Improves learning efficiency and enhances language comprehension.

Business

Document Translation

Used for translating corporate documents, contracts, or reports from English to French.

Saves manual translation costs and improves work efficiency.

🚀 opus-mt-tc-big-en-fr

A neural machine translation model designed for translating English (en) to French (fr), offering a practical solution for cross - language communication.

🚀 Quick Start

The opus-mt-tc-big-en-fr is a neural machine translation model that enables seamless translation from English to French. It's part of the OPUS - MT project, which aims to make neural machine translation models accessible across multiple languages.

✨ Features

Multilingual Accessibility: As part of the OPUS - MT project, it contributes to making neural machine translation available for a wide range of languages.
Efficient Training: Trained using the Marian NMT framework, an efficient NMT implementation in pure C++.
Publication - Backed: Supported by relevant publications, ensuring its academic and practical credibility.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "The Portuguese teacher is very demanding.",
    "When was your last hearing test?"
]

model_name = "pytorch-models/opus-mt-tc-big-en-fr"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     Le professeur de portugais est très exigeant.
#     Quand a eu lieu votre dernier test auditif ?

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-fr")
print(pipe("The Portuguese teacher is very demanding."))

# expected output: Le professeur de portugais est très exigeant.

📚 Documentation

Model info

Property	Details
Release	2022 - 03 - 09
Source Language(s)	eng
Target Language(s)	fra
Model Type	transformer - big
Training Data	opusTCv20210807+bt (source)
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	opusTCv20210807+bt_transformer - big_2022 - 03 - 09.zip
More Info on Released Models	OPUS - MT eng - fra README

Benchmarks

Test set translations: opusTCv20210807+bt_transformer - big_2022 - 03 - 09.test.txt
Test set scores: opusTCv20210807+bt_transformer - big_2022 - 03 - 09.eval.txt
Benchmark results: benchmark_results.txt
Benchmark output: benchmark_translations.zip

langpair	testset	chr - F	BLEU	#sent	#words
eng - fra	tatoeba - test - v2021 - 08 - 07	0.69621	53.2	12681	106378
eng - fra	flores101 - devtest	0.72494	52.2	1012	28343
eng - fra	multi30k_test_2016_flickr	0.72361	52.4	1000	13505
eng - fra	multi30k_test_2017_flickr	0.72826	52.8	1000	12118
eng - fra	multi30k_test_2017_mscoco	0.73547	54.7	461	5484
eng - fra	multi30k_test_2018_flickr	0.66723	43.7	1071	15867
eng - fra	newsdiscussdev2015	0.60471	33.4	1500	27940
eng - fra	newsdiscusstest2015	0.64915	40.3	1500	27975
eng - fra	newssyscomb2009	0.58903	30.7	502	12331
eng - fra	news - test2008	0.55516	27.6	2051	52685
eng - fra	newstest2009	0.57907	30.0	2525	69263
eng - fra	newstest2010	0.60156	33.5	2489	66022
eng - fra	newstest2011	0.61632	35.0	3003	80626
eng - fra	newstest2012	0.59736	32.8	3003	78011
eng - fra	newstest2013	0.59700	34.6	3000	70037
eng - fra	newstest2014	0.66686	41.9	3003	77306
eng - fra	tico19 - test	0.63022	40.6	2100	64661

Model conversion info

Property	Details
Transformers Version	4.16.2
OPUS - MT Git Hash	3405783
Port Time	Wed Apr 13 17:07:05 EEST 2022
Port Machine	LM0 - 400 - 22516.local

🔧 Technical Details

This model is part of the OPUS - MT project. It was originally trained using the Marian NMT framework and then converted to pyTorch using the transformers library by huggingface. The training data is sourced from OPUS, and the training pipelines follow the procedures of OPUS - MT - train.

Publications: OPUS - MT – Building open translation services for the World and The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT (Please, cite if you use this model.)

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

📄 License

This model is released under the cc - by - 4.0 license.

Acknowledgements

The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご