🚀 SeamlessM4T v2
SeamlessM4T is our foundational all - in - one Massively Multilingual and Multimodal Machine Translation model. It delivers high - quality translation for speech and text in nearly 100 languages.
✨ Features
- Supported Tasks: SeamlessM4T models support various tasks, including Speech - to - speech translation (S2ST), Speech - to - text translation (S2TT), Text - to - speech translation (T2ST), Text - to - text translation (T2TT), and Automatic speech recognition (ASR).
- Language Support:
- 🎤 It supports 101 languages for speech input.
- 💬 It supports 96 languages for text input/output.
- 🔊 It supports 35 languages for speech output.
- New Version: We are releasing SeamlessM4T v2, an updated version with our novel UnitY2 architecture. This new model improves over SeamlessM4T v1 in both quality and inference speed in speech generation tasks. The v2 version is a multitask adaptation of the UnitY2 architecture, which, with its hierarchical character - to - unit upsampling and non - autoregressive text - to - unit decoding, considerably enhances quality and inference speed compared to SeamlessM4T v1.
- 🤗 Transformers Support: SeamlessM4T v2 is also supported by 🤗 Transformers. More details here.
📦 Installation
To use SeamlessM4T with 🤗 Transformers, follow these steps:
- First, install the 🤗 Transformers library from main and sentencepiece:
pip install git+https://github.com/huggingface/transformers.git sentencepiece
💻 Usage Examples
Basic Usage
Run the following Python code to generate speech samples. Here the target language is Russian:
from transformers import AutoProcessor, SeamlessM4Tv2Model
import torchaudio
processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000)
audio_inputs = processor(audios=audio, return_tensors="pt")
audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
Advanced Usage
Listen to the audio samples either in an ipynb notebook:
from IPython.display import Audio
sample_rate = model.sampling_rate
Audio(audio_array_from_text, rate=sample_rate)
Or save them as a .wav
file using a third - party library, e.g., scipy
:
import scipy
sample_rate = model.sampling_rate
scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the SeamlessM4T v2 docs or to this hands - on Google Colab.
📚 Documentation
SeamlessM4T models
We provide the extensive evaluation results of seamlessM4T - Large and SeamlessM4T - Medium reported in the paper (as averages) in the metrics
files above. The evaluation data ids for FLEURS, CoVoST2 and CVSS - C can be found here.
Evaluating SeamlessM4T models
To reproduce our results or to evaluate using the same metrics over your own test sets, please check out the Evaluation README here.
Finetuning SeamlessM4T models
Please check out the Finetuning README here.
Supported Languages
The following table shows the languages supported by SeamlessM4T - large (v1/v2). The source
column specifies whether a language is supported as source speech (Sp
) and/or source text (Tx
). The target
column specifies whether a language is supported as target speech (Sp
) and/or target text (Tx
).
code |
language |
script |
Source |
Target |
afr |
Afrikaans |
Latn |
Sp, Tx |
Tx |
amh |
Amharic |
Ethi |
Sp, Tx |
Tx |
arb |
Modern Standard Arabic |
Arab |
Sp, Tx |
Sp, Tx |
ary |
Moroccan Arabic |
Arab |
Sp, Tx |
Tx |
arz |
Egyptian Arabic |
Arab |
Sp, Tx |
Tx |
asm |
Assamese |
Beng |
Sp, Tx |
Tx |
ast |
Asturian |
Latn |
Sp |
-- |
azj |
North Azerbaijani |
Latn |
Sp, Tx |
Tx |
bel |
Belarusian |
Cyrl |
Sp, Tx |
Tx |
ben |
Bengali |
Beng |
Sp, Tx |
Sp, Tx |
bos |
Bosnian |
Latn |
Sp, Tx |
Tx |
bul |
Bulgarian |
Cyrl |
Sp, Tx |
Tx |
cat |
Catalan |
Latn |
Sp, Tx |
Sp, Tx |
ceb |
Cebuano |
Latn |
Sp, Tx |
Tx |
ces |
Czech |
Latn |
Sp, Tx |
Sp, Tx |
ckb |
Central Kurdish |
Arab |
Sp, Tx |
Tx |
cmn |
Mandarin Chinese |
Hans |
Sp, Tx |
Sp, Tx |
cmn_Hant |
Mandarin Chinese |
Hant |
Sp, Tx |
Sp, Tx |
cym |
Welsh |
Latn |
Sp, Tx |
Sp, Tx |
dan |
Danish |
Latn |
Sp, Tx |
Sp, Tx |
deu |
German |
Latn |
Sp, Tx |
Sp, Tx |
ell |
Greek |
Grek |
Sp, Tx |
Tx |
eng |
English |
Latn |
Sp, Tx |
Sp, Tx |
est |
Estonian |
Latn |
Sp, Tx |
Sp, Tx |
eus |
Basque |
Latn |
Sp, Tx |
Tx |
fin |
Finnish |
Latn |
Sp, Tx |
Sp, Tx |
fra |
French |
Latn |
Sp, Tx |
Sp, Tx |
fuv |
Nigerian Fulfulde |
Latn |
Sp, Tx |
Tx |
gaz |
West Central Oromo |
Latn |
Sp, Tx |
Tx |
gle |
Irish |
Latn |
Sp, Tx |
Tx |
glg |
Galician |
Latn |
Sp, Tx |
Tx |
guj |
Gujarati |
Gujr |
Sp, Tx |
Tx |
heb |
Hebrew |
Hebr |
Sp, Tx |
Tx |
hin |
Hindi |
Deva |
Sp, Tx |
Sp, Tx |
hrv |
Croatian |
Latn |
Sp, Tx |
Tx |
hun |
Hungarian |
Latn |
Sp, Tx |
Tx |
hye |
Armenian |
Armn |
Sp, Tx |
Tx |
ibo |
Igbo |
Latn |
Sp, Tx |
Tx |
ind |
Indonesian |
Latn |
Sp, Tx |
Sp, Tx |
isl |
Icelandic |
Latn |
Sp, Tx |
Tx |
ita |
Italian |
Latn |
Sp, Tx |
Sp, Tx |
jav |
Javanese |
Latn |
Sp, Tx |
Tx |
jpn |
Japanese |
Jpan |
Sp, Tx |
Sp, Tx |
kam |
Kamba |
Latn |
Sp |
-- |
kan |
Kannada |
Knda |
Sp, Tx |
Tx |
kat |
Georgian |
Geor |
Sp, Tx |
Tx |
kaz |
Kazakh |
Cyrl |
Sp, Tx |
Tx |
kea |
Kabuverdianu |
Latn |
Sp |
-- |
khk |
Halh Mongolian |
Cyrl |
Sp, Tx |
Tx |
khm |
Khmer |
Khmr |
Sp, Tx |
Tx |
kir |
Kyrgyz |
Cyrl |
Sp, Tx |
Tx |
kor |
Korean |
Kore |
Sp, Tx |
Sp, Tx |
lao |
Lao |
Laoo |
Sp, Tx |
Tx |
lit |
Lithuanian |
Latn |
Sp, Tx |
Tx |
ltz |
Luxembourgish |
Latn |
Sp |
-- |
lug |
Ganda |
Latn |
Sp, Tx |
Tx |
luo |
Luo |
Latn |
Sp, Tx |
Tx |
lvs |
Standard Latvian |
Latn |
Sp, Tx |
Tx |
mai |
Maithili |
Deva |
Sp, Tx |
Tx |
mal |
Malayalam |
Mlym |
Sp, Tx |
Tx |
mar |
Marathi |
Deva |
Sp, Tx |
Tx |
mkd |
Macedonian |
Cyrl |
Sp, Tx |
Tx |
mlt |
Maltese |
Latn |
Sp, Tx |
Sp, Tx |
mni |
Meitei |
Beng |
Sp, Tx |
Tx |
mya |
Burmese |
Mymr |
Sp, Tx |
Tx |
nld |
Dutch |
Latn |
Sp, Tx |
Sp, Tx |
nno |
Norwegian Nynorsk |
Latn |
Sp, Tx |
Tx |
nob |
Norwegian Bokmål |
Latn |
Sp, Tx |
Tx |
npi |
Nepali |
Deva |
Sp, Tx |
Tx |
nya |
Nyanja |
Latn |
Sp, Tx |
Tx |
oci |
Occitan |
Latn |
Sp |
-- |
ory |
Odia |
Orya |
Sp, Tx |
Tx |
pan |
Punjabi |
Guru |
Sp, Tx |
Tx |
pbt |
Southern Pashto |
Arab |
Sp, Tx |
Tx |
pes |
Western Persian |
Arab |
Sp, Tx |
Sp, Tx |
pol |
Polish |
Latn |
Sp, Tx |
Sp, Tx |
por |
Portuguese |
Latn |
Sp, Tx |
Sp, Tx |
ron |
Romanian |
Latn |
Sp, Tx |
Sp, Tx |
rus |
Russian |
Cyrl |
Sp, Tx |
Sp, Tx |
slk |
Slovak |
Latn |
Sp, Tx |
Sp, Tx |
slv |
Slovenian |
Latn |
Sp, Tx |
Tx |
sna |
Shona |
Latn |
Sp, Tx |
Tx |
snd |
Sindhi |
Arab |
Sp, Tx |
Tx |
som |
Somali |
Latn |
Sp, Tx |
Tx |
spa |
Spanish |
Latn |
Sp, Tx |
Sp, Tx |
srp |
Serbian |
Cyrl |
Sp, Tx |
Tx |
swe |
Swedish |
Latn |
Sp, Tx |
Sp, Tx |
swh |
Swahili |
Latn |
Sp, Tx |
Sp, Tx |
tam |
Tamil |
Taml |
Sp, Tx |
Tx |
tel |
Telugu |
Telu |
Sp, Tx |
Sp, Tx |
tgk |
Tajik |
Cyrl |
Sp, Tx |
Tx |
tgl |
Tagalog |
Latn |
Sp, Tx |
Sp, Tx |
tha |
Thai |
Thai |
Sp, Tx |
Sp, Tx |
tur |
Turkish |
Latn |
Sp, Tx |
Sp, Tx |
ukr |
Ukrainian |
Cyrl |
Sp, Tx |
Sp, Tx |
urd |
Urdu |
Arab |
Sp, Tx |
Sp, Tx |
uzn |
Northern Uzbek |
Latn |
Sp, Tx |
Tx |
📄 License
The model is released under the cc - by - nc - 4.0 license.