Open-source Speech Model wav2vec2-FR-7K-base - Trained on 7.6 thousand hours of French speech for diverse speech recognition

Home

Wav2vec2 FR 7K Base

Developed by LeBenchmark

Large wav2vec2 model trained on 7.6K hours of French speech, including spontaneous, read, and broadcast speech

Speech Recognition

Transformers

FrenchOpen Source License:Apache-2.0 #French Speech Recognition #Self-supervised Learning #Multi-scenario Speech

Downloads 26

Release Time : 3/2/2022

Model Overview

One of the wav2vec2 model series provided by LeBenchmark, focusing on French speech processing, suitable for tasks like speech recognition

Model Features

Large-scale French Pretraining

Trained on 7.6K hours of French speech data, covering diverse speech types

Multi-architecture Support

Provides four different model architectures: Light, Base, Large, and xLarge

Diverse Data

Training data includes spontaneous, read, and broadcast speech, covering multiple scenarios

Model Capabilities

French Speech Feature Extraction

Speech Representation Learning

Speech Recognition

Speaker Recognition

Source Separation

Use Cases

Speech Processing

French Speech Recognition

Can be used to build French automatic speech recognition systems

Speaker Recognition

Can be used for speaker verification or identification tasks

Speech Research

Speech Representation Learning Research

Can serve as a pretrained model for speech-related research

🚀 LeBenchmark: wav2vec2 base model trained on 7K hours of French speech

LeBenchmark offers an ensemble of pretrained wav2vec2 models on various French datasets, which include spontaneous, read, and broadcasted speech. It has two versions, where the later version (LeBenchmark 2.0) is an extended one of the first version, both in terms of the number of pre - trained SSL models and downstream tasks. For more details on the benchmarks for evaluating wav2vec2 models, refer to our paper: LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self - supervised Representations of French Speech

✨ Features

Model and data descriptions

We release four different models under our HuggingFace organization. Four wav2vec2 architectures Light, Base, Large and xLarge are paired with our small (1K), medium (3K), large (7K), and extra - large (14K) corpus. In summary:

Lebenchmark 2.0:

[wav2vec2 - FR - 14K - xlarge](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - xlarge): xLarge wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
[wav2vec2 - FR - 14K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - large): Large wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
[wav2vec2 - FR - 14K - light](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - light): Light wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).

Lebenchmark:

[wav2vec2 - FR - 7K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 7K - large): Large wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
[wav2vec2 - FR - 7K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 7K - base): Base wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
[wav2vec2 - FR - 3K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 3K - large): Large wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
[wav2vec2 - FR - 3K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 3K - base): Base wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
[wav2vec2 - FR - 2.6K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 2.6K - base): Base wav2vec2 trained on 2.6K hours of French speech (no spontaneous speech).
[wav2vec2 - FR - 1K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 1K - large): Large wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).
[wav2vec2 - FR - 1K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 1K - base): Base wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).

Intended uses & limitations

Pretrained wav2vec2 models are distributed under the Apache - 2.0 license, so they can be reused extensively without strict limitations. However, benchmarks and data may be associated with corpora that are not fully open - sourced.

Fine - tune with Fairseq for ASR with CTC

Since our wav2vec2 models were trained with Fairseq, they can be used in the different tools provided by Fairseq to fine - tune the model for ASR with CTC. The full procedure is well - summarized in [this blogpost](https://huggingface.co/blog/fine - tune - wav2vec2 - english).

⚠️ Important Note

Due to the nature of CTC, speech - to - text results aren't expected to be state - of - the - art. Also, future features might appear depending on the involvement of Fairseq and HuggingFace in this area.

Integrate to SpeechBrain for ASR, Speaker, Source Separation ...

Pretrained wav2vec models have recently become more popular. Meanwhile, the SpeechBrain toolkit emerged, offering a new and simpler way to handle state - of - the - art speech and deep - learning technologies.

Although it is currently in beta, SpeechBrain provides two ways to integrate wav2vec2 models trained with Fairseq (i.e., our LeBenchmark models):

Extract wav2vec2 features on - the - fly (with a frozen wav2vec2 encoder) to combine with any speech - related architecture. Examples include: E2E ASR with CTC+Att+Language Models; Speaker Recognition or Verification, Source Separation...
Experimental: To fully utilize wav2vec2, the best approach is to fine - tune the model while training your downstream task. This is easily achievable in SpeechBrain by just turning on a flag. So, our wav2vec2 models can be fine - tuned while training your preferred ASR pipeline or Speaker Recognizer.

💡 Usage Tip

If interested, simply follow this tutorial

Referencing LeBenchmark

@misc{parcollet2023lebenchmark,
      title={LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self - supervised Representations of French Speech}, 
      author={Titouan Parcollet and Ha Nguyen and Solene Evain and Marcely Zanon Boito and Adrien Pupier and Salima Mdhaffar and Hang Le and Sina Alisamir and Natalia Tomashenko and Marco Dinarelli and Shucong Zhang and Alexandre Allauzen and Maximin Coavoux and Yannick Esteve and Mickael Rouvier and Jerome Goulian and Benjamin Lecouteux and Francois Portet and Solange Rossato and Fabien Ringeval and Didier Schwab and Laurent Besacier},
      year={2023},
      eprint={2309.05472},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

The pretrained wav2vec2 models are distributed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご