đ w2v-bert-uk v2.1
This is an automatic speech recognition model designed for the Ukrainian language. It is based on the facebook/w2v-bert-2.0
model and can achieve high accuracy in speech recognition tasks.
đ Quick Start
Prerequisites
Make sure you have installed the necessary libraries:
pip install -U torch soundfile transformers
Code Example
import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor
model_name = 'Yehor/w2v-bert-uk-v2.1'
device = 'cuda:0'
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
sampling_rate = 16_000
asr_model = AutoModelForCTC.from_pretrained(model_name, torch_dtype=torch_dtype).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)
paths = [
'sample1.wav',
]
audio_inputs = []
for path in paths:
audio_input, _ = sf.read(path)
audio_inputs.append(audio_input)
inputs = processor(audio_inputs, sampling_rate=sampling_rate).input_features
features = torch.tensor(inputs).half().to(device)
with torch.inference_mode():
logits = asr_model(features).logits
predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids)
print('Predictions:')
print(predictions)
⨠Features
- Automatic Speech Recognition: This model is specifically designed for automatic speech recognition tasks in the Ukrainian language.
- High Accuracy: Achieves a WER of 17.34% and a CER of 3.33% on the
common_voice_10_0
dataset.
đĻ Installation
The model can be installed using the transformers
library. You can install the necessary dependencies with the following command:
pip install -U torch soundfile transformers
đģ Usage Examples
Basic Usage
đ Documentation
Community
See other Ukrainian models: https://github.com/egorsmkv/speech-recognition-uk
Overview
This is a next model of https://huggingface.co/Yehor/w2v-bert-uk
Metrics
- AM (F16):
- WER: 0.1734 metric, 17.34%
- CER: 0.0333 metric, 3.33%
- Accuracy on words: 82.66%
- Accuracy on chars: 96.67%
Demo
Use https://huggingface.co/spaces/Yehor/w2v-bert-uk-v2.1-demo space to see how the model works with your audios.
Model Information
Property |
Details |
Base Model |
facebook/w2v-bert-2.0 |
Library Name |
transformers |
Language |
uk |
License |
apache-2.0 |
Task Categories |
automatic-speech-recognition |
Tags |
audio |
Datasets |
Yehor/openstt-uk |
Metrics |
wer |
Model Index
- Name: w2v-bert-uk-v2.1
- Results:
- Task:
- Name: Automatic Speech Recognition
- Type: automatic-speech-recognition
- Dataset:
- Name: common_voice_10_0
- Type: common_voice_10_0
- Config: uk
- Split: test
- Args: uk
- Metrics:
- Name: WER
- Type: wer
- Value: 17.34
- Name: CER
- Type: cer
- Value: 3.33
đ License
This model is licensed under the apache-2.0
license.
đ Cite this work
@misc {smoliakov_2025,
author = { {Smoliakov} },
title = { w2v-bert-uk-v2.1 (Revision 094c59d) },
year = 2025,
url = { https://huggingface.co/Yehor/w2v-bert-uk-v2.1 },
doi = { 10.57967/hf/4554 },
publisher = { Hugging Face }
}