# The open-source hf-seamless-m4t-large model - Free implementation of tasks translating between multilingual speech and text

Hf Seamless M4t Large

Developed by facebook

SeamlessM4T is a unified model supporting multilingual speech and text translation, capable of performing speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation tasks.

Text-to-Audio

Transformers

#Multilingual Speech Translation #Speech-to-Text and Text-to-Speech Conversion #Unified Translation Model

Downloads 4,648

Release Time : 9/13/2023

Model Overview

SeamlessM4T is a model collection designed to provide high-quality translation, supporting speech and text translation across multiple languages, enabling seamless communication among different language communities.

Model Features

Unified Multitask Model

A single model supports speech-to-speech, speech-to-text, text-to-speech, and text-to-text translation tasks without relying on multiple independent models.

Extensive Language Support

Supports 101 speech input languages, 96 text input/output languages, and 35 speech output languages.

High-Quality Translation

Delivers high-quality speech and text translations for multiple language pairs.

Flexible Generation Strategies

Supports various generation strategies, such as beam search decoding and multinomial sampling, allowing adjustment of generation effects based on needs.

Model Capabilities

Speech-to-Speech Translation

Speech-to-Text Translation

Text-to-Speech Translation

Text-to-Text Translation

Automatic Speech Recognition

Use Cases

Cross-Language Communication

Real-Time Speech Translation

Translates speech from one language to another in real-time, suitable for meetings, travel, and other scenarios.

High-quality multilingual speech output

Multilingual Content Generation

Translates text or speech content into multiple languages for generating multilingual media content.

Supports text and speech output in multiple languages

Speech Processing

Speech Recognition

Converts speech to text, supporting multiple speech input languages.

High-accuracy speech recognition

Speech Synthesis

Converts text to speech, supporting multiple speech output languages.

Natural-sounding speech synthesis

🚀 SeamlessM4T Large

SeamlessM4T is a collection of models that offer high - quality translation services. It enables people from diverse linguistic backgrounds to communicate smoothly through speech and text.

🚀 Quick Start

SeamlessM4T Large is a powerful unified model that supports multiple translation and recognition tasks. It can handle various input types (speech and text) and generate different output types (speech and text) across multiple languages.

✨ Features

Multilingual Support:
- 📥 Supports 101 languages for speech input.
- ⌨️ Supports 96 Languages for text input/output.
- 🗣️ Supports 35 languages for speech output.
Multi - task Capability: Enables multiple tasks without relying on multiple separate models, including:
- Speech - to - speech translation (S2ST)
- Speech - to - text translation (S2TT)
- Text - to - speech translation (T2ST)
- Text - to - text translation (T2TT)
- Automatic speech recognition (ASR)

📦 Installation

The model can be installed via the transformers library. Ensure you have the transformers library installed. You can install it using pip:

pip install transformers

💻 Usage Examples

Basic Usage

First, load the processor and a checkpoint of the model:

>>> from transformers import AutoProcessor, SeamlessM4TModel

>>> processor = AutoProcessor.from_pretrained("facebook/hf-seamless-m4t-large")
>>> model = SeamlessM4TModel.from_pretrained("facebook/hf-seamless-m4t-large")

Here is how to use the processor to process text and audio:

>>> # let's load an audio sample from an Arabic speech corpus
>>> from datasets import load_dataset
>>> dataset = load_dataset("arabic_speech_corpus", split="test", streaming=True)
>>> audio_sample = next(iter(dataset))["audio"]

>>> # now, process it
>>> audio_inputs = processor(audios=audio_sample["array"], return_tensors="pt")

>>> # now, process some English test as well
>>> text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")

Advanced Usage

Speech Translation

>>> audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
>>> audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()

Text Translation

>>> # from audio
>>> output_tokens = model.generate(**audio_inputs, tgt_lang="fra", generate_speech=False)
>>> translated_text_from_audio = processor.decode(output_tokens[0].tolist(), skip_special_tokens=True)

>>> # from text
>>> output_tokens = model.generate(**text_inputs, tgt_lang="fra", generate_speech=False)
>>> translated_text_from_text = processor.decode(output_tokens[0].tolist(), skip_special_tokens=True)

Tips

1. Use dedicated models

You can use dedicated models to reduce the memory footprint. For example:

>>> from transformers import SeamlessM4TForSpeechToSpeech
>>> model = SeamlessM4TForSpeechToSpeech.from_pretrained("facebook/hf-seamless-m4t-large")

Or for text - to - text translation:

>>> from transformers import SeamlessM4TForTextToText
>>> model = SeamlessM4TForTextToText.from_pretrained("facebook/hf-seamless-m4t-large")

You can also try out SeamlessM4TForSpeechToText and SeamlessM4TForTextToSpeech.

2. Change the speaker identity

You can change the speaker used for speech synthesis with the spkr_id argument. Some spkr_id works better than others for some languages.

3. Change the generation strategy

You can use different generation strategies for speech and text generation, e.g .generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True).

4. Generate speech and text at the same time

Use return_intermediate_token_ids=True with SeamlessM4TModel to return both speech and text.

📚 Documentation

New Version Information:
- SeamlessM4T v2, an improved version with a novel architecture, has been released here. It improves over SeamlessM4T v1 in quality and inference speed in speech generation tasks.
- SeamlessM4T v2 is also supported by 🤗 Transformers. More information can be found in the model card of this new version or directly in 🤗 Transformers docs.

📄 License

This model is licensed under the cc - by - nc - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご