Mitre_913m Open-Source Multilingual Translation Model - Supports Direct Translation in 552 Directions across 24 Languages

Mitre 913m

Developed by naist-nlp

MITRE is a multilingual, decoder-only model specifically designed for many-to-many translation tasks, supporting direct translation across 552 directions in 24 languages spanning 5 language families.

Machine Translation

Safetensors

Supports Multiple LanguagesOpen Source License:MIT #Many-to-many translation #Multilingual support #Cross-language family translation

Downloads 54

Release Time : 12/25/2024

Model Overview

MITRE (Multilingual Translation and Registration) is a multilingual translation model that adopts a decoder-only architecture, specifically designed for many-to-many translation tasks. The model supports direct translation across 552 directions in 24 languages spanning 5 language families.

Model Features

Multilingual support

Supports direct translation across 552 directions in 24 languages spanning 5 language families.

Efficient translation

Uses improved generation strategies to reduce generation costs and improve translation efficiency.

Registration technology

Employs registration technology introduced in the paper to optimize multilingual translation performance.

Model Capabilities

Multilingual translation

Text generation

Use Cases

Language translation

English to Chinese translation

Translate English text into Chinese.

Example: 'I have a red apple.' → '我有一个红苹果。'

Multilingual batch translation

Translate the same text into multiple target languages in batch.

Example: 'I have a red apple.' → German: 'Ich habe einen roten Apfel.', Chinese: '我有一个红苹果。'

🚀 MITRE 913M

MITRE (Multilingual Translation with Registers) is a multilingual, decoder-only model crafted for many-to-many translation tasks. It supports direct translation across 552 directions for 24 languages from 5 language families. This repo enables you to use our pre-trained model for inference.

🚀 Quick Start

Before getting the tokenizer, you need to run pip install sentencepiece first. Then you can easily call the tokenizer and the model.

💻 Usage Examples

Basic Usage

from transformers import AutoModel, AutoTokenizer

# you can switch the name to "naist-nlp/mitre_466m"
tokenizer = AutoTokenizer.from_pretrained("naist-nlp/mitre_913m", trust_remote_code=True, use_fast=False)
model = AutoModel.from_pretrained("naist-nlp/mitre_913m", trust_remote_code=True)

Advanced Usage

To use this model locally and check the codes, you can clone this hub.

from mitre_913m.tokenization_mitre import MitreTokenizer
from mitre_913m.modeling_mitre import MitreForConditionalGeneration

tokenizer = MitreTokenizer.from_pretrained("mitre_913m")
model = MitreForConditionalGeneration.from_pretrained("mitre_913m")

After getting the model and tokenizer objects, you can perform translation.

english_text = "I have a red apple."
chinese_text = "我有一个红苹果。"
model.half() # recommended
model.eval()

# Translating from one or several sentences to a sole language
src_tokens = tokenizer.encode_source_tokens_to_input_ids([english_text, ], target_language="zh")
# Translating from one or several sentences to corresponding languages
# src_tokens = tokenizer.encode_source_tokens_to_input_ids_with_different_tags([english_text, english_text, ], target_languages_list=["de", "zh", ])

generated_tokens = model.generate(src_tokens.cuda())
results = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(results)
# results
# de: Ich habe einen roten Apfel.
# zh: 我有一个红苹果。

# For training
# 1. The difference between tgt_tokens and labels is that the eos_tokens are moved to the right side.
# 2. We recommend using 'tokenizer.encode_target_tokens_to_labels' instead of modifying tgt_tokens,
#    because 'tokenizer.encode_target_tokens_to_input_ids' has pads.
# 3. You can refer to our code for detailed implementation.
# tgt_tokens = tokenizer.encode_target_tokens_to_input_ids(chinese_text)
# labels = tokenizer.encode_target_tokens_to_labels(chinese_text)

📚 Documentation

The technology of registering is introduced in our paper. If you want to reproduce the data mining and training, please refer to this repository. An alternative version of MITRE with 466M parameters is also available in this repository.

🔧 Technical Details

We generally follow the style of M2M, but make some necessary improvements to reduce generation cost. You can refer to the 'generate()' codes in modeling_mitre.py for more details. Additionally, we plan to implement FlashAttention V2 to further enhance our models, and will update as soon as possible.

📄 License

This project is licensed under the MIT license.

Languages covered

Germanic: English (en), German (de), Dutch; Flemish (nl), Swedish (sv), Danish (da), Afrikaans (af)
Romance: French (fr), Spanish (es), Italian (it), Portuguese (pt), Romanian; Moldavian; Moldovan (ro)
Slavic: Russian (ru), Czech (cs), Polish (pl), Bulgarian (bg), Ukrainian (uk)
Malayo - Polynesian: Indonesian (id), Malay (ms), Javanese (jv), Tagalog; Filipino (tl)
Asian*: Chinese (zh), Japanese (ja), Korean (ko), Vietnamese (vi)

BibTeX entry and citation info

@misc{qu2025registeringsourcetokenstarget,
      title={Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation}, 
      author={Zhi Qu and Yiran Wang and Jiannan Mao and Chenchen Ding and Hideki Tanaka and Masao Utiyama and Taro Watanabe},
      year={2025},
      eprint={2501.02979},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.02979}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご