Opus-mt-tc-big-bg-en Open Source Translation Model - Achieve Accurate Bulgarian to English Translation for Free

Home

Opus Mt Tc Big Bg En

Developed by Helsinki-NLP

A neural machine translation model for translating from Bulgarian to English, developed based on the OPUS-MT project.

Machine Translation

Transformers

Supports Multiple Languages#Bulgarian-English Translation #High BLEU Score #Multilingual Support

Downloads 69

Release Time : 4/13/2022

Model Overview

This model is a neural machine translation model based on the transformer architecture, specifically designed for translating Bulgarian text into English. It is part of the OPUS-MT project, which aims to provide high-quality machine translation services for multiple languages worldwide.

Model Features

High-Quality Translation

Achieves a BLEU score of 42.9 on the flores101-devtest dataset and 60.5 on the tatoeba-test-v2021-08-07 dataset.

Multilingual Support

Supports translation tasks from Bulgarian to English.

Open-Source Project

As part of the OPUS-MT project, the model is fully open-source under the cc-by-4.0 license.

Model Capabilities

Text Translation

Bulgarian to English Translation

Use Cases

Language Services

Document Translation

Translate Bulgarian documents, articles, or web content into English.

High-quality English translation results suitable for both commercial and personal use.

Educational Assistance

Assist students learning Bulgarian or English with language practice and translation assignments.

Provides accurate translation references to aid language learning.

🚀 opus-mt-tc-big-bg-en

A neural machine translation model for translating from Bulgarian (bg) to English (en). This model is part of the OPUS-MT project, aiming to make neural machine translation models widely available and accessible for many languages.

🚀 Quick Start

This model is designed to translate text from Bulgarian to English. It's part of the OPUS - MT project, leveraging the Marian NMT framework and converted to pyTorch using the transformers library.

✨ Features

Multilingual Accessibility: Part of the OPUS - MT project, making translation models available for many languages.
Efficient Training: Originally trained using the Marian NMT framework, an efficient NMT implementation in pure C++.
Converted to PyTorch: Converted to pyTorch using the transformers library by huggingface for easy integration.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    "2001 е годината, с която започва 21-ви век.",
    "Това е Copacabana!"
]

model_name = "pytorch-models/opus-mt-tc-big-bg-en"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

for t in translated:
    print( tokenizer.decode(t, skip_special_tokens=True) )

# expected output:
#     2001 was the year the 21st century began.
#     It's Copacabana!

Advanced Usage

from transformers import pipeline
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-bg-en")
print(pipe("2001 е годината, с която започва 21-ви век."))

# expected output: 2001 was the year the 21st century began.

📚 Documentation

Model Info

Property	Details
Release	2022 - 03 - 09
Source Language(s)	bul
Target Language(s)	eng
Model Type	transformer - big
Training Data	opusTCv20210807+bt (source)
Tokenization	SentencePiece (spm32k,spm32k)
Original Model	opusTCv20210807+bt_transformer-big_2022-03-09.zip
More Information	OPUS - MT bul-eng README

Benchmarks

langpair	testset	chr - F	BLEU	#sent	#words
bul - eng	tatoeba - test - v2021 - 08 - 07	0.73687	60.5	10000	71872
bul - eng	flores101 - devtest	0.67938	42.9	1012	24721

Test Set Translations: opusTCv20210807+bt_transformer-big_2022-03-09.test.txt
Test Set Scores: opusTCv20210807+bt_transformer-big_2022-03-09.eval.txt
Benchmark Results: benchmark_results.txt
Benchmark Output: benchmark_translations.zip

Publications

Please cite the following publications if you use this model:

@inproceedings{tiedemann-thottingal-2020-opus,
    title = "{OPUS}-{MT} {--} Building open translation services for the World",
    author = {Tiedemann, J{\"o}rg  and Thottingal, Santhosh},
    booktitle = "Proceedings of the 22nd Annual Conference of the European Association for Machine Translation",
    month = nov,
    year = "2020",
    address = "Lisboa, Portugal",
    publisher = "European Association for Machine Translation",
    url = "https://aclanthology.org/2020.eamt-1.61",
    pages = "479--480",
}

@inproceedings{tiedemann-2020-tatoeba,
    title = "The Tatoeba Translation Challenge {--} Realistic Data Sets for Low Resource and Multilingual {MT}",
    author = {Tiedemann, J{\"o}rg},
    booktitle = "Proceedings of the Fifth Conference on Machine Translation",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.wmt-1.139",
    pages = "1174--1182",
}

🔧 Technical Details

The model is originally trained using the Marian NMT framework, an efficient NMT implementation written in pure C++. It has been converted to pyTorch using the transformers library by huggingface. Training data is sourced from OPUS and the training pipelines follow the procedures of OPUS - MT - train.

📄 License

This model is released under the cc - by - 4.0 license.

📋 Model Conversion Info

Property	Details
Transformers Version	4.16.2
OPUS - MT Git Hash	3405783
Port Time	Wed Apr 13 18:23:56 EEST 2022
Port Machine	LM0 - 400 - 22516.local

Acknowledgements

The work is supported by the European Language Grid as pilot project 2866, by the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 771113), and the MeMAD project, funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 780069. We are also grateful for the generous computational resources and IT infrastructure provided by CSC -- IT Center for Science, Finland.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご