opus-mt-tc-big-en-fi Open-source Translation Model - Accurately Translate English to Finnish for Free

Opus Mt Tc Big En Fi

Developed by Helsinki-NLP

This is a large-scale neural machine translation model based on the Transformer architecture, specifically designed for translating English to Finnish. The model is part of the OPUS-MT project, trained using the Marian NMT framework, and provided through Hugging Face's transformers library.

Machine Translation

Transformers

Supports Multiple Languages#English-Finnish translation #Multilingual neural machine translation #High BLEU score

Downloads 1,255

Release Time : 3/22/2022

Model Overview

This model is a multi-target language translation model that requires adding language tags (e.g., >>fin<<) at the beginning of sentences to specify the target language. It supports English-to-Finnish translation tasks and is suitable for various text translation scenarios.

Model Features

Multi-target language support

By adding language tags (e.g., >>fin<<) before the input text, the target language for translation can be specified.

High-performance translation

Performs excellently in multiple benchmarks, achieving a BLEU score of 39.3 on the Tatoeba test set.

Trained on OPUS data

Trained using high-quality multilingual parallel corpora from OPUS, ensuring translation quality.

Model Capabilities

English-to-Finnish text translation

Supports batch translation

Supports long-text translation

Use Cases

Content localization

Website content translation

Translate English website content into Finnish to help Finnish users better understand the content.

Achieved a BLEU score of 26.4-31.3 on the news translation test set.

Education

Learning aid tool

Assist students in translating English learning materials into Finnish to aid language learning.

Achieved a BLEU score of 39.3 on the Tatoeba test set.

🚀 opus-mt-tc-big-en-fi

A neural machine translation model designed for translating text from English (en) to Finnish (fi). It's part of a broader initiative to make high - quality translation models accessible globally.

🚀 Quick Start

This is a neural machine translation model for translating from English to Finnish. It's part of the OPUS - MT project, aiming to offer accessible neural machine translation models for multiple languages. The model is initially trained with Marian NMT and then converted to pyTorch using the transformers library by huggingface.

✨ Features

Multilingual Support: A multilingual translation model with multiple target languages. A sentence - initial language token in the form of >>id<< (id = valid target language ID), e.g., >>fin<<, is required.
Based on OPUS: Training data is sourced from OPUS, and training pipelines follow the procedures of OPUS - MT - train.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "Russia is big.",
    "Touch wood!"
]

model_name = "pytorch-models/opus-mt-tc-big-en-fi"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     Venäjä on suuri.
#     Kosketa puuta!

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-fi")
print(pipe("Russia is big."))

# expected output: Venäjä on suuri.

📚 Documentation

Model Info

Property	Details
Release	2022 - 03 - 09
Source Language(s)	eng
Target Language(s)	fin
Valid Target Language Labels	>>fin<<
Model Type	transformer (big)
Training Data	opusTCv20210807+bt (source)
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	opusTCv20210807+bt_transformer - big_2022 - 03 - 09.zip
More Info on Released Models	OPUS - MT eng - fin README
More Info about the Model	MarianMT

Publications

Please cite the following publications if you use this model:

[OPUS - MT – Building open translation services for the World](https://aclanthology.org/2020.eamt - 1.61/)
[The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt - 1.139/)

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

Benchmarks

Test Set Translations: opusTCv20210807+bt_transformer - big_2022 - 03 - 09.test.txt
Test Set Scores: opusTCv20210807+bt_transformer - big_2022 - 03 - 09.eval.txt
Benchmark Results: benchmark_results.txt
Benchmark Output: benchmark_translations.zip

langpair	testset	chr - F	BLEU	#sent	#words
eng - fin	tatoeba - test - v2021 - 08 - 07	0.64352	39.3	10690	65122
eng - fin	flores101 - devtest	0.61334	27.6	1012	18781
eng - fin	newsdev2015	0.58367	24.2	1500	23091
eng - fin	newstest2015	0.60080	26.4	1370	19735
eng - fin	newstest2016	0.61636	28.8	3000	47678
eng - fin	newstest2017	0.64381	31.3	3002	45269
eng - fin	newstest2018	0.55626	19.7	3000	44836
eng - fin	newstest2019	0.58420	26.4	1997	38369
eng - fin	newstestB2016	0.57554	23.3	3000	45766
eng - fin	newstestB2017	0.60212	26.8	3002	45506

🔧 Technical Details

The model is trained using the Marian NMT framework, which is an efficient NMT implementation in pure C++. The training data is sourced from OPUS, and the training pipelines follow the procedures of OPUS - MT - train. After training, the model is converted to pyTorch using the transformers library by huggingface.

📄 License

This model is released under the cc - by - 4.0 license.

Acknowledgements

The development of this model is supported by multiple projects:

European Language Grid as pilot project 2866.
[FoTran project](https://www.helsinki.fi/en/researchgroups/natural - language - understanding - with - cross - lingual - grounding), funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113).
MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069.

We also appreciate the computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Model conversion info

Transformers Version: 4.16.2
OPUS - MT Git Hash: f084bad
Port Time: Tue Mar 22 14:42:32 EET 2022
Port Machine: LM0 - 400 - 22516.local

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご