stt_fr_conformer_transducer_large Open-source Model - Accurately Achieve Automatic French Speech Recognition

Stt Fr Conformer Transducer Large

Developed by nvidia

This is a large-scale Conformer-Transducer model for French automatic speech recognition, with approximately 120 million parameters, trained on over 1,500 hours of French speech data.

Speech Recognition

PyTorch

French#French Speech Recognition #High-precision WER #Multi-dataset Training

Downloads 31

Release Time : 6/30/2022

Model Overview

This model is a Conformer-Transducer architecture model for French automatic speech recognition (ASR), capable of converting French speech into text.

Model Features

Large-scale Training

Trained on over 1,500 hours of French speech data, including datasets from multiple sources.

High Performance

Achieves low word error rate (WER) on multiple test sets.

Multi-dataset Support

Trained on multiple datasets including Mozilla Common Voice, Multilingual LibriSpeech, and VoxPopuli.

Model Capabilities

French Speech Recognition

Audio Transcription

Speech-to-Text

Use Cases

Speech Transcription

French Speech Transcription

Convert French speech content into text

Achieves a WER of 7.95% on the MCV7.0 test set.

Voice Assistants

French Voice Command Recognition

Speech recognition component for French voice assistants or voice control systems

🚀 NVIDIA Conformer-Transducer Large (fr)

This large-sized Conformer-Transducer model (about 120M parameters) was trained on a composite dataset of over 1500 hours of French speech, offering high - quality automatic speech recognition for French.

🚀 Quick Start

This model is available for use in the NeMo toolkit and can serve as a pre - trained checkpoint for inference or fine - tuning on another dataset.

📦 Installation

To train, fine - tune, or play with the model, you need to install NVIDIA NeMo. We recommend installing it after installing the latest Pytorch version.

pip install nemo_toolkit['all']

💻 Usage Examples

Basic Usage

Automatically instantiate the model:

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained("nvidia/stt_fr_conformer_transducer_large")

Advanced Usage

Transcribing using Python

First, get a sample:

wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav

Then simply do:

output = asr_model.transcribe(['2086-149220-0033.wav'])
print(output[0].text)

Transcribing many audio files

python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py 
 pretrained_name="nvidia/stt_fr_conformer_transducer_large" 
 audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"

📚 Documentation

Model Information

Property	Details
Model Type	Conformer - Transducer
Language	fr
Params	120M
Training Data	MozillaCommonVoice 7.0 (356 hours), Multilingual LibriSpeech (1036 hours), VoxPopuli (182 hours)

This model accepts 16000 kHz Mono - channel Audio (wav files) as input and provides transcribed speech as a string for a given audio sample.

See the model architecture section and NeMo documentation for complete architecture details.

NVIDIA NeMo: Training

The NeMo toolkit [3] was used for training the models for over several hundred epochs. These models are trained with this example script and this base config.

The sentence - piece tokenizers [2] for these models were built using the text transcripts of the train set with this script.

🔧 Technical Details

Conformer - Transducer model is an autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses Transducer loss/decoding instead of CTC Loss. You may find more info on the detail of this model here: Conformer - Transducer Model.

📈 Performance

The performance of Automatic Speech Recognition models is measured using Word Error Rate. Since this dataset is trained on multiple domains and a much larger corpus, it will generally perform better at transcribing audio in general.

The latest model obtains the following greedy scores on the following evaluation datasets:

6.85 % on MCV7.0 dev
7.95 % on MCV7.0 test
5.05 % on MLS dev
4.10 % on MLS test

Note that these evaluation datasets have been filtered and preprocessed to only contain French alphabet characters and are removed of punctuation outside of hyphenation and apostrophe.

⚠️ Limitations

⚠️ Important Note

Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.

Further, since portions of the training set contain text from both pre - and post - 1990 orthographic reform, regularity of punctuation may vary between the two styles. For downstream tasks requiring more consistency, finetuning or downstream processing may be required. If exact orthography is not necessary, then using secondary model is advised.

📄 License

This model is licensed under cc - by - 4.0.

📖 References

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご