opus-mt-tc-big-fi-en Open Source Translation Model - Achieve Accurate Finnish-to-English Translation for Free

Opus Mt Tc Big Fi En

Developed by Helsinki-NLP

This is a neural machine translation model for translating from Finnish to English, part of the OPUS-MT project, using a large transformer architecture.

Machine Translation

Transformers

Supports Multiple Languages#Finnish-English translation #High BLEU score #News text translation

Downloads 1,105

Release Time : 3/22/2022

Model Overview

This model is specifically designed for Finnish-to-English translation tasks, trained on the OPUS multilingual corpus, supporting high-quality text translation.

Model Features

Multilingual support

Focuses on high-quality Finnish-to-English translation, supporting bilingual translation.

Large transformer architecture

Utilizes a large transformer model architecture to provide more accurate translation results.

Extensive data training

Trained on the OPUS multilingual corpus and additional data, covering various text types.

Model Capabilities

Text translation

Finnish-to-English translation

Multilingual support

Use Cases

Text translation

News translation

Translate Finnish news content into English.

Achieved a BLEU score of 37.3 on the newstest2017 test set.

Daily conversation translation

Translate daily conversations between Finnish and English.

Achieved a BLEU score of 57.4 on the tatoeba-test-v2021-08-07 test set.

🚀 opus-mt-tc-big-fi-en

A neural machine translation model designed to translate text from Finnish (fi) to English (en). This model is part of a broader initiative to make high - quality translation accessible across multiple languages.

🚀 Quick Start

This model is a neural machine translation solution for Finnish to English translation. It's part of the [OPUS - MT project](https://github.com/Helsinki - NLP/Opus - MT), aiming to provide accessible NMT models for various languages. The models are initially trained with [Marian NMT](https://marian - nmt.github.io/), a C++ - based NMT framework, and then converted to pyTorch using the Hugging Face transformers library. The training data comes from OPUS, and the training pipelines follow the procedures of [OPUS - MT - train](https://github.com/Helsinki - NLP/Opus - MT - train).

Publications: [OPUS - MT – Building open translation services for the World](https://aclanthology.org/2020.eamt - 1.61/) and [The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT](https://aclanthology.org/2020.wmt - 1.139/) (Please, cite if you use this model.)

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

✨ Features

Multilingual Initiative: Part of the OPUS - MT project, offering translation solutions for a wide range of languages.
Efficient Training: Trained with the Marian NMT framework, known for its efficiency and written in pure C++.
PyTorch Compatibility: Converted to PyTorch using the Hugging Face transformers library, facilitating easy integration.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "Kolme kolmanteen on kaksikymmentäseitsemän.",
    "Heille syntyi poikavauva."
]

model_name = "pytorch - models/opus - mt - tc - big - fi - en"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki - NLP/opus - mt - tc - big - fi - en")
print(pipe("Kolme kolmanteen on kaksikymmentäseitsemän."))

📚 Documentation

Model info

Property	Details
Release	2021 - 12 - 08
Source Language(s)	fin
Target Language(s)	eng
Model	transformer (big)
Data	opusTCv20210807+bt ([source](https://github.com/Helsinki - NLP/Tatoeba - Challenge))
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	[opusTCv20210807+bt - 2021 - 12 - 08.zip](https://object.pouta.csc.fi/Tatoeba - MT - models/fin - eng/opusTCv20210807+bt - 2021 - 12 - 08.zip)
More Info	[OPUS - MT fin - eng README](https://github.com/Helsinki - NLP/Tatoeba - Challenge/tree/master/models/fin - eng/README.md)

Benchmarks

langpair	testset	chr - F	BLEU	#sent	#words
fin - eng	tatoeba - test - v2021 - 08 - 07	0.72298	57.4	10690	80552
fin - eng	flores101 - devtest	0.62521	35.4	1012	24721
fin - eng	newsdev2015	0.56232	28.6	1500	32012
fin - eng	newstest2015	0.57469	29.9	1370	27270
fin - eng	newstest2016	0.60715	34.3	3000	62945
fin - eng	newstest2017	0.63050	37.3	3002	61846
fin - eng	newstest2018	0.54199	27.1	3000	62325
fin - eng	newstest2019	0.59620	32.7	1996	36215
fin - eng	newstestB2016	0.55472	27.9	3000	62945
fin - eng	newstestB2017	0.58847	31.1	3002	61846

Acknowledgements

The work is supported by the [European Language Grid](https://www.european - language - grid.eu/) as [pilot project 2866](https://live.european - language - grid.eu/catalogue/#/resource/projects/2866), by the [FoTran project](https://www.helsinki.fi/en/researchgroups/natural - language - understanding - with - cross - lingual - grounding), funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Model conversion info

Property	Details
Transformers Version	4.16.2
OPUS - MT Git Hash	f084bad
Port Time	Tue Mar 22 14:52:19 EET 2022
Port Machine	LM0 - 400 - 22516.local

📄 License

The model is released under the cc - by - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご