wav2vec2-FR-1K-base Open-source Speech Model - Free Deployment to Boost French Speech Recognition

Wav2vec2 FR 1K Base

Developed by LeBenchmark

A wav2vec2 base model trained on 1K hours of French speech, supporting tasks like speech recognition

FrenchOpen Source License:Apache-2.0 #French Speech Recognition #Self-supervised Learning #Multi-scenario Speech Processing

Downloads 85

Release Time : 3/2/2022

Model Overview

The wav2vec2 base model provided by LeBenchmark, trained on 1K hours of French speech, including spontaneous, read-aloud, and broadcast speech data. Suitable for French speech processing tasks.

Model Features

Multi-type Speech Training

The model is trained on a French dataset containing spontaneous, read-aloud, and broadcast speech

Multiple Scales Available

Model versions with different training scales from 1K to 14K hours are provided

Gender-balanced Data

The 1K version is trained with 0.5K male/0.5K female speech data

Model Capabilities

French Speech Recognition

Speech Feature Extraction

Speaker Recognition

Source Separation

Use Cases

Speech Processing

French Speech-to-Text

Convert French speech into text content

Speaker Recognition

Identify the speaker's identity in the speech

🚀 LeBenchmark: wav2vec2 base model trained on 1K hours of French speech

LeBenchmark offers an ensemble of pre - trained wav2vec2 models on various French datasets, which include spontaneous, read, and broadcasted speech. It has two versions. The later version (LeBenchmark 2.0) is an extended one compared to the first version, both in terms of the number of pre - trained SSL models and the number of downstream tasks. For more details on the different benchmarks used to evaluate the wav2vec2 models, please refer to our paper: LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self - supervised Representations of French Speech

✨ Features

Model and data descriptions

We release four different models that can be found under our HuggingFace organization. Four different wav2vec2 architectures Light, Base, Large and xLarge are combined with our small (1K), medium (3K), large (7K), and extra - large (14K) corpus. In brief:

Lebenchmark 2.0:

[wav2vec2 - FR - 14K - xlarge](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - xlarge): xLarge wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
[wav2vec2 - FR - 14K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - large): Large wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
[wav2vec2 - FR - 14K - light](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - light): Light wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).

Lebenchmark:

[wav2vec2 - FR - 7K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 7K - large): Large wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
[wav2vec2 - FR - 7K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 7K - base): Base wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
[wav2vec2 - FR - 3K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 3K - large): Large wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
[wav2vec2 - FR - 3K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 3K - base): Base wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
[wav2vec2 - FR - 2.6K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 2.6K - base): Base wav2vec2 trained on 2.6K hours of French speech (no spontaneous speech).
[wav2vec2 - FR - 1K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 1K - large): Large wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).
[wav2vec2 - FR - 1K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 1K - base): Base wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).

Intended uses & limitations

Pretrained wav2vec2 models are distributed under the Apache - 2.0 license. Thus, they can be reused extensively without strict limitations. However, benchmarks and data may be related to corpora that are not completely open - sourced.

📚 Documentation

Fine - tune with Fairseq for ASR with CTC

As our wav2vec2 models were trained with Fairseq, they can be used in the different tools provided by Fairseq to fine - tune the model for ASR with CTC. The full procedure has been well - summarized in [this blogpost](https://huggingface.co/blog/fine - tune - wav2vec2 - english).

Please note that due to the nature of CTC, speech - to - text results aren't expected to be state - of - the - art. Moreover, future features might appear depending on the involvement of Fairseq and HuggingFace in this area.

Integrate to SpeechBrain for ASR, Speaker, Source Separation ...

Pretrained wav2vec models have recently become more popular. At the same time, SpeechBrain toolkit emerged, offering a new and simpler way to handle state - of - the - art speech & deep - learning technologies.

While it is currently in beta, SpeechBrain provides two different ways to nicely integrate wav2vec2 models trained with Fairseq, i.e., our LeBenchmark models!

Extract wav2vec2 features on - the - fly (with a frozen wav2vec2 encoder) to be combined with any speech - related architecture. Examples are: E2E ASR with CTC+Att+Language Models; Speaker Recognition or Verification, Source Separation ...
Experimental: To fully benefit from wav2vec2, the best solution is to fine - tune the model while training your downstream task. This is very simply allowed within SpeechBrain as just a flag needs to be turned on. Thus, our wav2vec2 models can be fine - tuned while training your favorite ASR pipeline or Speaker Recognizer.

If interested, simply follow this tutorial

📄 License

The pretrained wav2vec2 models are distributed under the Apache - 2.0 license.

Referencing LeBenchmark

@misc{parcollet2023lebenchmark,
      title={LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self - supervised Representations of French Speech}, 
      author={Titouan Parcollet and Ha Nguyen and Solene Evain and Marcely Zanon Boito and Adrien Pupier and Salima Mdhaffar and Hang Le and Sina Alisamir and Natalia Tomashenko and Marco Dinarelli and Shucong Zhang and Alexandre Allauzen and Maximin Coavoux and Yannick Esteve and Mickael Rouvier and Jerome Goulian and Benjamin Lecouteux and Francois Portet and Solange Rossato and Fabien Ringeval and Didier Schwab and Laurent Besacier},
      year={2023},
      eprint={2309.05472},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご