whisper-large-v2-pl-v2 Open-source Speech Recognition Model - Achieve Accurate Polish Speech-to-Text Translation for Free

Whisper Large V2 Pl V2

Developed by bardsai

An automatic speech recognition model fine-tuned on Polish datasets based on Whisper Large v2, supporting Polish speech-to-text tasks.

Speech Recognition

Transformers

Other#Polish speech recognition #Low word error rate #Multi-dataset fine-tuning

Downloads 217

Release Time : 12/14/2022

Model Overview

This is an automatic speech recognition (ASR) model specifically optimized for Polish, fine-tuned on the Common Voice 11.0 and FLEURS datasets, capable of accurately converting Polish speech into text.

Model Features

High-precision Polish recognition

Achieves a 7.28% word error rate (WER) on the Common Voice 11.0 test set, demonstrating excellent performance

Multi-dataset training

Trained using two high-quality Polish datasets: Common Voice 11.0 and FLEURS

Optimized training process

Employs carefully designed training hyperparameters and gradient accumulation strategies to ensure training effectiveness

Model Capabilities

Polish speech recognition

Speech-to-text

Automatic speech transcription

Use Cases

Speech transcription

Automated meeting minutes

Automatically converts Polish meeting recordings into text transcripts

Highly accurate transcript text

Media subtitle generation

Automatically generates subtitles for Polish video content

Low error rate subtitle output

Voice assistants

Polish voice command recognition

Used for command understanding in Polish voice assistant systems

High accuracy command recognition

🚀 Whisper Large v2 PL

This is a fine - tuned version of the Whisper model for Polish language, achieving high performance in automatic speech recognition tasks.

📚 Documentation

Model Information

Property	Details
Language	Polish (PL)
Tags	whisper - event, generated_from_trainer
Datasets	mozilla - foundation/common_voice_11_0, google/fleurs
Metrics	WER (Word Error Rate)

Model Performance

The model, named Whisper Large v2 PL, has the following results in Automatic Speech Recognition tasks:

Common Voice 11.0

Task: Automatic Speech Recognition
Dataset: mozilla - foundation/common_voice_11_0 (PL, test split)
Metrics:
- WER: 7.280175959972464
- WER: 7.31
- WER unnormalized: 20.18
- CER (Character Error Rate): 2.08
- MER (Match Error Rate): 7.27

facebook/voxpopuli

Task: Automatic Speech Recognition
Dataset: facebook/voxpopuli (PL, test split)
Metrics:
- WER: 9.61
- WER unnormalized: 30.33
- CER: 5.5
- MER: 9.45

google/fleurs

Task: Automatic Speech Recognition
Dataset: google/fleurs (pl_pl, test split)
Metrics:
- WER: 8.68
- WER unnormalized: 29.33
- CER: 3.63
- MER: 8.62

Model Fine - Tuning

This model is a fine - tuned version of [bardsai/whisper - large - v2 - pl](https://huggingface.co/bardsai/whisper - large - v2 - pl) on the Common Voice 11.0 and the FLEURS datasets. It achieves the following results on the evaluation set:

Loss: 0.3684
Wer: 7.2802

🔧 Technical Details

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e - 05
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 2100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0047	1.35	700	0.3428	8.5562
0.0011	2.7	1400	0.3605	7.5505
0.0003	4.05	2100	0.3684	7.2802

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.7.1.dev0
Tokenizers 0.13.2

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご