mms - lid - 126 Open - Source Language Identification Model, Free Support for Audio Classification of 126 Languages!

Mms Lid 126

Developed by facebook

A language identification model fine-tuned from Facebook's Massively Multilingual Speech project, supporting audio classification for 126 languages

Audio Classification

Transformers

Supports Multiple Languages#126 Language Identification #1 Billion Parameter Speech Model #Wav2Vec2 Architecture

Downloads 2.1M

Release Time : 6/13/2023

Model Overview

Based on the Wav2Vec2 architecture, this model classifies raw audio input into probability distributions across 126 languages, representing a large-scale speech recognition model with 1 billion parameters

Model Features

Extensive Language Support

Supports speech recognition for 126 languages, covering major global languages

Large-Scale Model

Built on the 1-billion-parameter Wav2Vec2 architecture with powerful speech feature extraction capabilities

High Accuracy

Delivers excellent performance across multiple languages, accurately identifying speech features of different languages

Easy Integration

Seamlessly integrates with Hugging Face Transformers library for convenient deployment

Model Capabilities

Speech Language Identification

Multilingual Audio Classification

Real-time Speech Analysis

Use Cases

Speech Technology Applications

Multilingual Voice Assistants

Automatically identifies user's spoken language type for smart devices

Accurately recognizes 126 languages, enhancing user experience

Speech Content Analysis

Automatically identifies language types in audio streams

Supports large-scale multilingual speech data processing

Voice Routing Systems

Routes calls to corresponding language service systems based on identification results

Improves customer service system efficiency

🚀 Massively Multilingual Speech (MMS) - Finetuned LID

This project offers a fine - tuned model for speech language identification (LID). It's part of Facebook's Massive Multilingual Speech project, based on the Wav2Vec2 architecture. The model classifies raw audio input into a probability distribution over 126 languages, with 1 billion parameters and is fine - tuned from facebook/mms-1b.

🚀 Quick Start

This MMS checkpoint can be used with Transformers to identify the spoken language of an audio. It can recognize 126 languages.

Installation

First, we need to install some necessary libraries:

pip install torch accelerate torchaudio datasets
pip install --upgrade transformers

⚠️ Important Note In order to use MMS you need to have at least transformers >= 4.30 installed. If the 4.30 version is not yet available on PyPI make sure to install transformers from source:

pip install git+https://github.com/huggingface/transformers.git

Usage Examples

Basic Usage

Next, we load audio samples, the model, and the processor, and then classify the audio into languages.

# Load audio samples
from datasets import load_dataset, Audio

# English
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]

# Arabic
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "ar", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
ar_sample = next(iter(stream_data))["audio"]["array"]

# Load the model and processor
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch

model_id = "facebook/mms-lid-126"

processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)

# English
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'eng'

# Arabic
inputs = processor(ar_sample, sampling_rate=16_000, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs).logits

lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'ara'

To see all the supported languages of a checkpoint, you can print out the language ids as follows:

processor.id2label.values()

For more details about the architecture, please refer to the official docs.

✨ Features

Multilingual Support: This model supports 126 languages, providing a wide - range of language identification capabilities.
Based on Wav2Vec2: Leveraging the advanced Wav2Vec2 architecture for accurate audio classification.

📦 Supported Languages

This model supports 126 languages. Click the following to toggle all supported languages of this checkpoint in ISO 639 - 3 code. You can find more details about the languages and their ISO 649 - 3 codes in the MMS Language Coverage Overview.

Click to toggle

📚 Model details

Property	Details
Developed by	Vineel Pratap et al.
Model Type	Multi - Lingual Automatic Speech Recognition model
Language(s)	126 languages, see supported languages
License	CC - BY - NC 4.0 license
Num parameters	1 billion
Audio sampling rate	16,000 kHz
Cite as	@article{pratap2023mms, title={Scaling Speech Technology to 1,000+ Languages}, author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel - Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei - Ning Hsu and Alexis Conneau and Michael Auli}, journal={arXiv}, year={2023} }

📄 Additional Links

Blog post
Transformers documentation
Paper
GitHub Repository
Other MMS checkpoints
MMS base checkpoints:
- facebook/mms-1b
- facebook/mms-300m
Official Space

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご