đ Automatic Speech Recognition for Belarusian language
This project offers a fine - tuned model for automatic speech recognition in the Belarusian language, leveraging the wav2vec2 architecture and enhancing performance with a language model.
đ Quick Start
This is a fine - tuned version of [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base) on the mozilla - foundation/common_voice_8_0 be
dataset.
The Train
, Dev
, and Test
splits from the dataset were used as they are. No additional data from the Validated
split was used. Only one voicing of each sentence was employed, following the data split by [CommonVoice CorporaCreator](https://github.com/common - voice/CorporaCreator). To build a better model, one can use additional voicings from the Validated
split for sentences already in the Train
, Dev
, Test
splits, effectively enlarging these splits.
A language model was built using KenLM. A 5 - gram Language model was constructed on sentences from the Train+(Other - Dev - Test)
splits of the mozilla - foundation/common_voice_8_0 be
dataset.
The source code is available here.
⨠Features
- Fine - tuned Model: Based on [facebook/wav2vec2 - base](https://huggingface.co/facebook/wav2vec2 - base), fine - tuned on the
mozilla - foundation/common_voice_8_0 be
dataset.
- Language Model: A 5 - gram language model built using KenLM to enhance performance.
- Interactive Demo: An interactive demo widget on this page allows you to test the acoustic model in the browser. You can also test the full pipeline of the acoustic model + language model on the [spaces page](https://huggingface.co/spaces/ales/wav2vec2 - cv - be - lm).
đ Documentation
Model Information
Property |
Details |
Model Type |
wav2vec2 |
Training Data |
mozilla - foundation/common_voice_8_0 (Belarusian language) |
Results
The model was evaluated on the Common Voice 8
dataset for the Belarusian language. The following metrics were obtained:
Task |
Dataset |
Metric |
Value |
Automatic Speech Recognition |
Common Voice 8 (be) |
Dev WER |
17.61 |
Automatic Speech Recognition |
Common Voice 8 (be) |
Test WER |
18.7 |
Automatic Speech Recognition |
Common Voice 8 (be) (with LM) |
Dev WER |
11.5 |
Automatic Speech Recognition |
Common Voice 8 (be) (with LM) |
Test WER |
12.4 |
Running the Model
Run model in a browser
This page contains an interactive demo widget that allows you to test this model directly in a browser. However, this widget uses only the Acoustic model without the Language model, which significantly improves overall performance.
You can experiment with the full pipeline of Acoustic model + Language model on the following [spaces page](https://huggingface.co/spaces/ales/wav2vec2 - cv - be - lm) (also accessible from the browser).
đ License
This project is licensed under the GPL - 3.0 license.