Wav2vec2-cv-be Open-source Speech Recognition Model - Free Deployment for Accurate Recognition of Belarusian Speech

Wav2vec2 Cv Be

Developed by ales

An automatic speech recognition system fine-tuned on the Common Voice 8 Belarusian dataset based on facebook/wav2vec2-base model

Speech Recognition

Transformers

OtherOpen Source License:Gpl-3.0 #Belarusian ASR #wav2vec2 fine-tuning #low word error rate

Downloads 278

Release Time : 4/13/2022

Model Overview

This is an automatic speech recognition (ASR) model for Belarusian language, based on the wav2vec2 architecture and fine-tuned on the Belarusian portion of Mozilla Common Voice 8.0 dataset.

Model Features

High Accuracy Recognition

Achieves 12.4% word error rate on Common Voice 8 test set (with language model)

Language Model Integration

Includes a 5-gram language model built with KenLM, significantly improving recognition accuracy

Browser Compatibility

Provides interactive demo components that can run directly in browsers

Model Capabilities

Belarusian speech-to-text

Real-time speech recognition

Audio content transcription

Use Cases

Speech Transcription

Belarusian Speech Transcription

Convert Belarusian speech content into text

12.4% word error rate (test set)

Voice Assistants

Belarusian Voice Interaction

Provide recognition capability for Belarusian voice assistants

🚀 Automatic Speech Recognition for Belarusian language

This project offers a fine - tuned model for automatic speech recognition in the Belarusian language, leveraging the wav2vec2 architecture and enhancing performance with a language model.

🚀 Quick Start

This is a fine - tuned version of [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base) on the mozilla - foundation/common_voice_8_0 be dataset.

The Train, Dev, and Test splits from the dataset were used as they are. No additional data from the Validated split was used. Only one voicing of each sentence was employed, following the data split by [CommonVoice CorporaCreator](https://github.com/common - voice/CorporaCreator). To build a better model, one can use additional voicings from the Validated split for sentences already in the Train, Dev, Test splits, effectively enlarging these splits.

A language model was built using KenLM. A 5 - gram Language model was constructed on sentences from the Train+(Other - Dev - Test) splits of the mozilla - foundation/common_voice_8_0 be dataset.

The source code is available here.

✨ Features

Fine - tuned Model: Based on [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base), fine - tuned on the mozilla - foundation/common_voice_8_0 be dataset.
Language Model: A 5 - gram language model built using KenLM to enhance performance.
Interactive Demo: An interactive demo widget on this page allows you to test the acoustic model in the browser. You can also test the full pipeline of the acoustic model + language model on the [spaces page](https://huggingface.co/spaces/ales/wav2vec2 - cv - be - lm).

📚 Documentation

Model Information

Property	Details
Model Type	wav2vec2
Training Data	mozilla - foundation/common_voice_8_0 (Belarusian language)

Results

The model was evaluated on the Common Voice 8 dataset for the Belarusian language. The following metrics were obtained:

Task	Dataset	Metric	Value
Automatic Speech Recognition	Common Voice 8 (be)	Dev WER	17.61
Automatic Speech Recognition	Common Voice 8 (be)	Test WER	18.7
Automatic Speech Recognition	Common Voice 8 (be) (with LM)	Dev WER	11.5
Automatic Speech Recognition	Common Voice 8 (be) (with LM)	Test WER	12.4

Running the Model

Run model in a browser

This page contains an interactive demo widget that allows you to test this model directly in a browser. However, this widget uses only the Acoustic model without the Language model, which significantly improves overall performance.

You can experiment with the full pipeline of Acoustic model + Language model on the following [spaces page](https://huggingface.co/spaces/ales/wav2vec2 - cv - be - lm) (also accessible from the browser).

📄 License

This project is licensed under the GPL - 3.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご