Opus-MT-TC-Big-En-Tr Open-Source Translation Model - Free Support for English to Turkish Translation

Opus Mt Tc Big En Tr

Developed by Helsinki-NLP

This is a neural machine translation model based on the Transformer architecture, specifically designed for English-to-Turkish translation tasks.

Machine Translation

Transformers

Supports Multiple Languages#English-Turkish Translation #High-Precision Translation #Multi-Domain Applicability

Downloads 108.32k

Release Time : 4/13/2022

Model Overview

This model is part of the OPUS-MT project, aiming to provide high-quality English-to-Turkish translation services. It was trained using a large amount of publicly available parallel corpora and has demonstrated excellent performance in multiple benchmark tests.

Model Features

High-Quality Translation

Performs excellently in multiple benchmark tests, such as flores101-devtest and tatoeba-test-v2021-08-07.

Multilingual Support

Supports English-to-Turkish translation tasks.

Open-Source License

Uses the cc-by-4.0 license, allowing free use and modification.

Model Capabilities

English-to-Turkish Text Translation

Use Cases

Text Translation

Daily Conversation Translation

Used for translating daily conversations or simple sentences.

Achieved a BLEU score of 42.3 on the tatoeba-test-v2021-08-07 test set.

News Translation

Used for translating news articles or reports.

Achieved a BLEU score of 25.4 on the newstest2017 test set.

🚀 opus-mt-tc-big-en-tr

A neural machine translation model designed to translate from English (en) to Turkish (tr). It's part of the OPUS-MT project, aiming to make neural machine translation models accessible for various languages.

🚀 Quick Start

This model is a neural machine translation solution for English to Turkish translation. It's part of the OPUS - MT project, leveraging the Marian NMT framework and trained on data from OPUS.

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "I know Tom didn't want to eat that.",
    "On Sundays, we would get up early and go fishing."
]

model_name = "pytorch-models/opus-mt-tc-big-en-tr"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     Tom'un bunu yemek istemediğini biliyorum.
#     Pazar günleri erkenden kalkıp balık tutmaya giderdik.

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-tr")
print(pipe("I know Tom didn't want to eat that."))

# expected output: Tom'un bunu yemek istemediğini biliyorum.

✨ Features

Multilingual Support: Capable of translating from English to Turkish.
Open - Source Project: Part of the OPUS - MT project, promoting wide availability of NMT models.
Efficient Training: Trained using the Marian NMT framework, written in pure C++.

📦 Installation

The installation process is mainly about setting up the necessary Python libraries. You can install the transformers library via pip:

pip install transformers

📚 Documentation

Model info

Property	Details
Release	2022 - 02 - 25
Source Language(s)	eng
Target Language(s)	tur
Model Type	transformer - big
Training Data	opusTCv20210807+bt (source)
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	opusTCv20210807+bt_transformer - big_2022 - 02 - 25.zip
More Information	OPUS - MT eng - tur README

Benchmarks

langpair	testset	chr - F	BLEU	#sent	#words
eng - tur	tatoeba - test - v2021 - 08 - 07	0.68726	42.3	13907	84364
eng - tur	flores101 - devtest	0.62829	31.4	1012	20253
eng - tur	newsdev2016	0.58947	21.9	1001	15958
eng - tur	newstest2016	0.57624	23.4	3000	50782
eng - tur	newstest2017	0.58858	25.4	3007	51977
eng - tur	newstest2018	0.57848	22.6	3000	53731

Acknowledgements

The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Model conversion info

Property	Details
Transformers Version	4.16.2
OPUS - MT Git Hash	3405783
Port Time	Wed Apr 13 18:11:39 EEST 2022
Port Machine	LM0 - 400 - 22516.local

📄 License

This model is released under the cc - by - 4.0 license.

Publications

Please cite the following papers if you use this model:

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご