The open-source Ukrainian voice recognition model w2v-bert-uk-v2.1 - Achieve accurate voice recognition for free

Home

W2v Bert Uk V2.1

Developed by Yehor

Ukrainian speech recognition model fine-tuned on Yehor/openstt-uk dataset, based on facebook/w2v-bert-2.0

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Ukrainian speech recognition #Low Word Error Rate (WER)#High character accuracy

Downloads 492

Release Time : 8/7/2024

Model Overview

Ukrainian automatic speech recognition (ASR) model capable of converting Ukrainian speech to text

Model Features

High Accuracy

Achieves 17.34% Word Error Rate and 3.33% Character Error Rate on Common Voice Ukrainian test set

Optimized Inference

Supports FP16 precision inference for efficient operation on GPUs

Community Support

Backed by an active Ukrainian speech technology community

Model Capabilities

Ukrainian speech recognition

Audio-to-text conversion

Supports 16kHz sample rate audio processing

Use Cases

Speech Transcription

Meeting Minutes Transcription

Convert Ukrainian meeting recordings into text transcripts

82.66% accuracy

Media Caption Generation

Automatically generate subtitles for Ukrainian video content

96.67% character accuracy

🚀 w2v-bert-uk `v2.1`

This is an automatic speech recognition model designed for the Ukrainian language. It is based on the facebook/w2v-bert-2.0 model and can achieve high accuracy in speech recognition tasks.

🚀 Quick Start

Prerequisites

Make sure you have installed the necessary libraries:

pip install -U torch soundfile transformers

Code Example

# pip install -U torch soundfile transformers

import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor

# Config
model_name = 'Yehor/w2v-bert-uk-v2.1'
device = 'cuda:0' # or cpu
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
sampling_rate = 16_000

# Load the model
asr_model = AutoModelForCTC.from_pretrained(model_name, torch_dtype=torch_dtype).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)

paths = [
  'sample1.wav',
]

# Extract audio
audio_inputs = []
for path in paths:
  audio_input, _ = sf.read(path)
  audio_inputs.append(audio_input)

# Transcribe the audio
inputs = processor(audio_inputs, sampling_rate=sampling_rate).input_features
features = torch.tensor(inputs).half().to(device)

with torch.inference_mode():
  logits = asr_model(features).logits

predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids)

# Log results
print('Predictions:')
print(predictions)

✨ Features

Automatic Speech Recognition: This model is specifically designed for automatic speech recognition tasks in the Ukrainian language.
High Accuracy: Achieves a WER of 17.34% and a CER of 3.33% on the common_voice_10_0 dataset.

📦 Installation

The model can be installed using the transformers library. You can install the necessary dependencies with the following command:

pip install -U torch soundfile transformers

💻 Usage Examples

Basic Usage

# The above code example shows the basic usage of the model.

📚 Documentation

Community

Discord: https://bit.ly/discord-uds
Speech Recognition: https://t.me/speech_recognition_uk
Speech Synthesis: https://t.me/speech_synthesis_uk

See other Ukrainian models: https://github.com/egorsmkv/speech-recognition-uk

Overview

This is a next model of https://huggingface.co/Yehor/w2v-bert-uk

Metrics

AM (F16):
- WER: 0.1734 metric, 17.34%
- CER: 0.0333 metric, 3.33%
- Accuracy on words: 82.66%
- Accuracy on chars: 96.67%

Demo

Use https://huggingface.co/spaces/Yehor/w2v-bert-uk-v2.1-demo space to see how the model works with your audios.

Model Information

Property	Details
Base Model	facebook/w2v-bert-2.0
Library Name	transformers
Language	uk
License	apache-2.0
Task Categories	automatic-speech-recognition
Tags	audio
Datasets	Yehor/openstt-uk
Metrics	wer

Model Index

Name: w2v-bert-uk-v2.1
- Results:
  - Task:
    - Name: Automatic Speech Recognition
    - Type: automatic-speech-recognition
  - Dataset:
    - Name: common_voice_10_0
    - Type: common_voice_10_0
    - Config: uk
    - Split: test
    - Args: uk
  - Metrics:
    - Name: WER
    - Type: wer
    - Value: 17.34
    - Name: CER
    - Type: cer
    - Value: 3.33

📄 License

This model is licensed under the apache-2.0 license.

📚 Cite this work

@misc {smoliakov_2025,
	author       = { {Smoliakov} },
	title        = { w2v-bert-uk-v2.1 (Revision 094c59d) },
	year         = 2025,
	url          = { https://huggingface.co/Yehor/w2v-bert-uk-v2.1 },
	doi          = { 10.57967/hf/4554 },
	publisher    = { Hugging Face }
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご