Opus-mt-tc-big-tr-en Open-source Translation Model - Free and Efficient Translation from Turkish to English

Opus Mt Tc Big Tr En

Developed by Helsinki-NLP

This is a large neural machine translation model based on the Transformer architecture, specifically designed for translating from Turkish to English.

Machine Translation

Transformers

Supports Multiple Languages#Turkish-English translation #High-precision machine translation #Multi-domain applicability

Downloads 98.62k

Release Time : 4/13/2022

Model Overview

This model is part of the OPUS-MT project, aiming to provide high-quality machine translation services for Turkish-to-English translation tasks.

Model Features

High-quality translation

Performs excellently in multiple benchmarks, achieving a BLEU score of 57.6 on the Tatoeba test set.

Multi-domain support

Capable of handling translations for various domains including news and daily conversations.

Open-source license

Released under the cc-by-4.0 license, permitting both commercial and research use.

Model Capabilities

Turkish to English text translation

Handling multiple text types (news, conversations, etc.)

Use Cases

Content localization

News translation

Translating Turkish news into English

Achieved a BLEU score of 30.7 on the newstest2018 test set

Education

Language learning assistance

Helping learners understand Turkish content

🚀 opus-mt-tc-big-tr-en

This is a neural machine translation model designed for translating from Turkish (tr) to English (en). It's part of the OPUS - MT project, aiming to make neural machine translation models widely accessible for various languages.

🚀 Quick Start

This model is a neural machine translation solution for Turkish to English translation. It's part of the OPUS - MT project, leveraging the Marian NMT framework and converted to pyTorch using the transformers library by huggingface.

✨ Features

Multilingual Support: Supports Turkish to English translation.
Open - Source: Part of an open - source project, making it accessible to the community.
Benchmarked Performance: Demonstrates performance on multiple datasets with BLEU scores provided.

📚 Documentation

Model Info

Property	Details
Release	2022 - 03 - 17
Source Language(s)	tur
Target Language(s)	eng
Model Type	transformer - big
Training Data	opusTCv20210807+bt (source)
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	opusTCv20210807+bt_transformer - big_2022 - 03 - 17.zip
More Info on Released Models	OPUS - MT tur - eng README

Publications

Please cite the following papers if you use this model:

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "Allahsızlığı Yayma Kürsüsü başkanıydı.",
    "Tom'a ne olduğunu öğrenin."
]

model_name = "pytorch-models/opus-mt-tc-big-tr-en"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     He was the president of the Curse of Spreading Godlessness.
#     Find out what happened to Tom.

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-tr-en")
print(pipe("Allahsızlığı Yayma Kürsüsü başkanıydı."))

# expected output: He was the president of the Curse of Spreading Godlessness.

🔧 Technical Details

Benchmarks

Test Set Translations: opusTCv20210807+bt_transformer - big_2022 - 03 - 17.test.txt
Test Set Scores: opusTCv20210807+bt_transformer - big_2022 - 03 - 17.eval.txt
Benchmark Results: benchmark_results.txt
Benchmark Output: benchmark_translations.zip

langpair	testset	chr - F	BLEU	#sent	#words
tur - eng	tatoeba - test - v2021 - 08 - 07	0.71895	57.6	13907	109231
tur - eng	flores101 - devtest	0.64152	37.6	1012	24721
tur - eng	newsdev2016	0.58658	32.1	1001	21988
tur - eng	newstest2016	0.56960	29.3	3000	66175
tur - eng	newstest2017	0.57455	29.7	3007	67703
tur - eng	newstest2018	0.58488	30.7	3000	68725

Model Conversion Info

Transformers Version: 4.16.2
OPUS - MT Git Hash: 3405783
Port Time: Wed Apr 13 20:02:48 EEST 2022
Port Machine: LM0 - 400 - 22516.local

📄 License

This model is released under the cc - by - 4.0 license.

Acknowledgements

The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご