Open-source model wav2vec2-FR-2.6K-base - Realize speech recognition applications based on French speech training

Home

Wav2vec2 FR 2.6K Base

Developed by LeBenchmark

Base wav2vec2 model trained on 2.6K hours of French speech data, excluding spontaneous speech

Speech Recognition

Transformers

FrenchOpen Source License:Apache-2.0 #French speech recognition #Multi-scenario pre-training #Self-supervised learning

Downloads 41

Release Time : 3/2/2022

Model Overview

French speech pre-training model provided by LeBenchmark, using wav2vec2 architecture, suitable for speech-related tasks. This version is the base model without spontaneous speech in training data.

Model Features

French speech optimization

Pre-trained specifically on French speech data, covering various speech types (reading, broadcasting, etc.)

Multi-scenario applicability

Supports fine-tuning for various downstream tasks like speech recognition, speaker verification, etc.

Standardized framework

Trained on LeBenchmark's standardized framework to ensure reproducible results

Model Capabilities

Speech feature extraction

Speech recognition

Speaker verification

Source separation

Use Cases

Speech processing

French speech recognition

Achieve French speech-to-text through model fine-tuning

Speaker separation

Implement speech separation in multi-speaker scenarios using SpeechBrain toolkit

🚀 LeBenchmark: wav2vec2 base model trained on 2.6K hours of French speech

LeBenchmark offers an ensemble of pretrained wav2vec2 models on various French datasets, covering spontaneous, read, and broadcasted speech. It has two versions, where the later version (LeBenchmark 2.0) extends the first one in terms of both the number of pre - trained SSL models and downstream tasks. For more details on evaluating wav2vec2 models, refer to our paper: LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self - supervised Representations of French Speech

✨ Features

Model and data descriptions

We've released four different models available under our HuggingFace organization. Four wav2vec2 architectures (Light, Base, Large, and xLarge) are paired with our small (1K), medium (3K), large (7K), and extra - large (14K) corpora.

Lebenchmark 2.0

[wav2vec2 - FR - 14K - xlarge](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - xlarge): xLarge wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
[wav2vec2 - FR - 14K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - large): Large wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
[wav2vec2 - FR - 14K - light](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 14K - light): Light wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).

Lebenchmark

[wav2vec2 - FR - 7K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 7K - large): Large wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
[wav2vec2 - FR - 7K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 7K - base): Base wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
[wav2vec2 - FR - 3K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 3K - large): Large wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
[wav2vec2 - FR - 3K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 3K - base): Base wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
[wav2vec2 - FR - 2.6K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 2.6K - base): Base wav2vec2 trained on 2.6K hours of French speech (no spontaneous speech).
[wav2vec2 - FR - 1K - large](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 1K - large): Large wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).
[wav2vec2 - FR - 1K - base](https://huggingface.co/LeBenchmark/wav2vec2 - FR - 1K - base): Base wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).

Intended uses & limitations

Pretrained wav2vec2 models are distributed under the Apache - 2.0 license, allowing extensive reuse without strict limitations. However, benchmarks and data may be associated with corpora that are not fully open - sourced.

Fine - tune with Fairseq for ASR with CTC

Since our wav2vec2 models were trained with Fairseq, they can be used in the tools provided by Fairseq to fine - tune the model for ASR with CTC. The full process is well - summarized in [this blogpost](https://huggingface.co/blog/fine - tune - wav2vec2 - english).

Note that due to the nature of CTC, speech - to - text results are not expected to be state - of - the - art. Future features may emerge depending on the involvement of Fairseq and HuggingFace.

Integrate to SpeechBrain for ASR, Speaker, Source Separation ...

Pretrained wav2vec models have become popular recently. Meanwhile, the SpeechBrain toolkit offers a new and simpler approach to state - of - the - art speech and deep - learning technologies.

Although it's currently in beta, SpeechBrain provides two ways to integrate wav2vec2 models trained with Fairseq (our LeBenchmark models):

Extract wav2vec2 features on - the - fly (with a frozen wav2vec2 encoder) to combine with any speech - related architecture, such as E2E ASR with CTC+Att+Language Models, Speaker Recognition or Verification, Source Separation, etc.
Experimental: To fully utilize wav2vec2, fine - tuning the model during downstream task training is the best option. This is easily achievable in SpeechBrain by turning on a flag. So, our wav2vec2 models can be fine - tuned while training your preferred ASR pipeline or Speaker Recognizer.

If interested, follow this [tutorial](https://colab.research.google.com/drive/17Hu1pxqhfMisjkSgmM2CnZxfqDyn2hSY?usp = sharing)

📄 License

Pretrained wav2vec2 models are distributed under the Apache - 2.0 license.

📚 Documentation

Referencing LeBenchmark

@misc{parcollet2023lebenchmark,
      title={LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self - supervised Representations of French Speech}, 
      author={Titouan Parcollet and Ha Nguyen and Solene Evain and Marcely Zanon Boito and Adrien Pupier and Salima Mdhaffar and Hang Le and Sina Alisamir and Natalia Tomashenko and Marco Dinarelli and Shucong Zhang and Alexandre Allauzen and Maximin Coavoux and Yannick Esteve and Mickael Rouvier and Jerome Goulian and Benjamin Lecouteux and Francois Portet and Solange Rossato and Fabien Ringeval and Didier Schwab and Laurent Besacier},
      year={2023},
      eprint={2309.05472},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご