seamless-m4t-v2-large Open-source Multilingual Translation Model - Supports Speech and Text Translation in Nearly 100 Languages

Seamless M4t V2 Large

Developed by audo

SeamlessM4T is a large-scale multilingual multimodal machine translation model supporting speech and text translation in nearly 100 languages.

Text-to-Audio

Safetensors

Supports Multiple Languages#Multilingual speech translation #Real-time voice conversion #Cross-modal translation

Downloads 39

Release Time : 12/3/2023

Model Overview

SeamlessM4T is a foundational all-in-one multilingual multimodal machine translation model that delivers high-quality translations for speech and text. It supports multiple tasks including speech-to-speech, speech-to-text, text-to-speech, text-to-text translation, and automatic speech recognition.

Model Features

Multilingual support

Supports speech input in 101 languages and text input/output in 96 languages, covering major global languages

Multimodal translation

Supports various translation modes including speech-to-speech, speech-to-text, text-to-speech, and text-to-text

High-quality translation

Utilizes the novel UnitY2 architecture, outperforming previous versions in both quality and inference speed

Fast inference

Significantly improves inference speed through hierarchical character-to-unit upsampling and non-autoregressive text-to-unit decoding

Model Capabilities

Speech recognition

Speech synthesis

Text translation

Speech translation

Multilingual processing

Use Cases

Real-time translation

Conference real-time translation

Provides real-time speech translation services in multinational meetings

Supports real-time mutual translation in multiple languages

Voice assistant

Enables multilingual voice interaction for smart devices

Achieves natural cross-language conversations

Content localization

Video subtitle generation

Automatically generates multilingual video subtitles

Enhances content accessibility

Multilingual podcasts

Translates podcast content into multiple language versions

Expands audience reach

🚀 SeamlessM4T v2

SeamlessM4T is our foundational all - in - one Massively Multilingual and Multimodal Machine Translation model. It delivers high - quality translation for speech and text in nearly 100 languages.

✨ Features

Supported Tasks: SeamlessM4T models support various tasks, including Speech - to - speech translation (S2ST), Speech - to - text translation (S2TT), Text - to - speech translation (T2ST), Text - to - text translation (T2TT), and Automatic speech recognition (ASR).
Language Support:
- 🎤 It supports 101 languages for speech input.
- 💬 It supports 96 languages for text input/output.
- 🔊 It supports 35 languages for speech output.
New Version: We are releasing SeamlessM4T v2, an updated version with our novel UnitY2 architecture. This new model improves over SeamlessM4T v1 in both quality and inference speed in speech generation tasks. The v2 version is a multitask adaptation of the UnitY2 architecture, which, with its hierarchical character - to - unit upsampling and non - autoregressive text - to - unit decoding, considerably enhances quality and inference speed compared to SeamlessM4T v1.
🤗 Transformers Support: SeamlessM4T v2 is also supported by 🤗 Transformers. More details here.

📦 Installation

To use SeamlessM4T with 🤗 Transformers, follow these steps:

First, install the 🤗 Transformers library from main and sentencepiece:

pip install git+https://github.com/huggingface/transformers.git sentencepiece

💻 Usage Examples

Basic Usage

Run the following Python code to generate speech samples. Here the target language is Russian:

from transformers import AutoProcessor, SeamlessM4Tv2Model
import torchaudio

processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")

# from text
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()

# from audio
audio, orig_freq =  torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
audio =  torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
audio_inputs = processor(audios=audio, return_tensors="pt")
audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()

Advanced Usage

Listen to the audio samples either in an ipynb notebook:

from IPython.display import Audio

sample_rate = model.sampling_rate
Audio(audio_array_from_text, rate=sample_rate)
# Audio(audio_array_from_audio, rate=sample_rate)

Or save them as a .wav file using a third - party library, e.g., scipy:

import scipy

sample_rate = model.sampling_rate
scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
# scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio)

For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the SeamlessM4T v2 docs or to this hands - on Google Colab.

📚 Documentation

SeamlessM4T models

Property	Details
Model Types	SeamlessM4T - Large v2, SeamlessM4T - Large (v1), SeamlessM4T - Medium (v1)
#params	2.3B for SeamlessM4T - Large v2 and SeamlessM4T - Large (v1), 1.2B for SeamlessM4T - Medium (v1)
checkpoint	checkpoint for SeamlessM4T - Large v2, checkpoint for SeamlessM4T - Large (v1), checkpoint for SeamlessM4T - Medium (v1)
metrics	metrics for SeamlessM4T - Large v2, metrics for SeamlessM4T - Large (v1), metrics for SeamlessM4T - Medium (v1)

We provide the extensive evaluation results of seamlessM4T - Large and SeamlessM4T - Medium reported in the paper (as averages) in the metrics files above. The evaluation data ids for FLEURS, CoVoST2 and CVSS - C can be found here.

Evaluating SeamlessM4T models

To reproduce our results or to evaluate using the same metrics over your own test sets, please check out the Evaluation README here.

Finetuning SeamlessM4T models

Please check out the Finetuning README here.

Supported Languages

The following table shows the languages supported by SeamlessM4T - large (v1/v2). The source column specifies whether a language is supported as source speech (Sp) and/or source text (Tx). The target column specifies whether a language is supported as target speech (Sp) and/or target text (Tx).

code	language	script	Source	Target
afr	Afrikaans	Latn	Sp, Tx	Tx
amh	Amharic	Ethi	Sp, Tx	Tx
arb	Modern Standard Arabic	Arab	Sp, Tx	Sp, Tx
ary	Moroccan Arabic	Arab	Sp, Tx	Tx
arz	Egyptian Arabic	Arab	Sp, Tx	Tx
asm	Assamese	Beng	Sp, Tx	Tx
ast	Asturian	Latn	Sp	--
azj	North Azerbaijani	Latn	Sp, Tx	Tx
bel	Belarusian	Cyrl	Sp, Tx	Tx
ben	Bengali	Beng	Sp, Tx	Sp, Tx
bos	Bosnian	Latn	Sp, Tx	Tx
bul	Bulgarian	Cyrl	Sp, Tx	Tx
cat	Catalan	Latn	Sp, Tx	Sp, Tx
ceb	Cebuano	Latn	Sp, Tx	Tx
ces	Czech	Latn	Sp, Tx	Sp, Tx
ckb	Central Kurdish	Arab	Sp, Tx	Tx
cmn	Mandarin Chinese	Hans	Sp, Tx	Sp, Tx
cmn_Hant	Mandarin Chinese	Hant	Sp, Tx	Sp, Tx
cym	Welsh	Latn	Sp, Tx	Sp, Tx
dan	Danish	Latn	Sp, Tx	Sp, Tx
deu	German	Latn	Sp, Tx	Sp, Tx
ell	Greek	Grek	Sp, Tx	Tx
eng	English	Latn	Sp, Tx	Sp, Tx
est	Estonian	Latn	Sp, Tx	Sp, Tx
eus	Basque	Latn	Sp, Tx	Tx
fin	Finnish	Latn	Sp, Tx	Sp, Tx
fra	French	Latn	Sp, Tx	Sp, Tx
fuv	Nigerian Fulfulde	Latn	Sp, Tx	Tx
gaz	West Central Oromo	Latn	Sp, Tx	Tx
gle	Irish	Latn	Sp, Tx	Tx
glg	Galician	Latn	Sp, Tx	Tx
guj	Gujarati	Gujr	Sp, Tx	Tx
heb	Hebrew	Hebr	Sp, Tx	Tx
hin	Hindi	Deva	Sp, Tx	Sp, Tx
hrv	Croatian	Latn	Sp, Tx	Tx
hun	Hungarian	Latn	Sp, Tx	Tx
hye	Armenian	Armn	Sp, Tx	Tx
ibo	Igbo	Latn	Sp, Tx	Tx
ind	Indonesian	Latn	Sp, Tx	Sp, Tx
isl	Icelandic	Latn	Sp, Tx	Tx
ita	Italian	Latn	Sp, Tx	Sp, Tx
jav	Javanese	Latn	Sp, Tx	Tx
jpn	Japanese	Jpan	Sp, Tx	Sp, Tx
kam	Kamba	Latn	Sp	--
kan	Kannada	Knda	Sp, Tx	Tx
kat	Georgian	Geor	Sp, Tx	Tx
kaz	Kazakh	Cyrl	Sp, Tx	Tx
kea	Kabuverdianu	Latn	Sp	--
khk	Halh Mongolian	Cyrl	Sp, Tx	Tx
khm	Khmer	Khmr	Sp, Tx	Tx
kir	Kyrgyz	Cyrl	Sp, Tx	Tx
kor	Korean	Kore	Sp, Tx	Sp, Tx
lao	Lao	Laoo	Sp, Tx	Tx
lit	Lithuanian	Latn	Sp, Tx	Tx
ltz	Luxembourgish	Latn	Sp	--
lug	Ganda	Latn	Sp, Tx	Tx
luo	Luo	Latn	Sp, Tx	Tx
lvs	Standard Latvian	Latn	Sp, Tx	Tx
mai	Maithili	Deva	Sp, Tx	Tx
mal	Malayalam	Mlym	Sp, Tx	Tx
mar	Marathi	Deva	Sp, Tx	Tx
mkd	Macedonian	Cyrl	Sp, Tx	Tx
mlt	Maltese	Latn	Sp, Tx	Sp, Tx
mni	Meitei	Beng	Sp, Tx	Tx
mya	Burmese	Mymr	Sp, Tx	Tx
nld	Dutch	Latn	Sp, Tx	Sp, Tx
nno	Norwegian Nynorsk	Latn	Sp, Tx	Tx
nob	Norwegian Bokmål	Latn	Sp, Tx	Tx
npi	Nepali	Deva	Sp, Tx	Tx
nya	Nyanja	Latn	Sp, Tx	Tx
oci	Occitan	Latn	Sp	--
ory	Odia	Orya	Sp, Tx	Tx
pan	Punjabi	Guru	Sp, Tx	Tx
pbt	Southern Pashto	Arab	Sp, Tx	Tx
pes	Western Persian	Arab	Sp, Tx	Sp, Tx
pol	Polish	Latn	Sp, Tx	Sp, Tx
por	Portuguese	Latn	Sp, Tx	Sp, Tx
ron	Romanian	Latn	Sp, Tx	Sp, Tx
rus	Russian	Cyrl	Sp, Tx	Sp, Tx
slk	Slovak	Latn	Sp, Tx	Sp, Tx
slv	Slovenian	Latn	Sp, Tx	Tx
sna	Shona	Latn	Sp, Tx	Tx
snd	Sindhi	Arab	Sp, Tx	Tx
som	Somali	Latn	Sp, Tx	Tx
spa	Spanish	Latn	Sp, Tx	Sp, Tx
srp	Serbian	Cyrl	Sp, Tx	Tx
swe	Swedish	Latn	Sp, Tx	Sp, Tx
swh	Swahili	Latn	Sp, Tx	Sp, Tx
tam	Tamil	Taml	Sp, Tx	Tx
tel	Telugu	Telu	Sp, Tx	Sp, Tx
tgk	Tajik	Cyrl	Sp, Tx	Tx
tgl	Tagalog	Latn	Sp, Tx	Sp, Tx
tha	Thai	Thai	Sp, Tx	Sp, Tx
tur	Turkish	Latn	Sp, Tx	Sp, Tx
ukr	Ukrainian	Cyrl	Sp, Tx	Sp, Tx
urd	Urdu	Arab	Sp, Tx	Sp, Tx
uzn	Northern Uzbek	Latn	Sp, Tx	Tx

📄 License

The model is released under the cc - by - nc - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご