Whisper-large-v3-lv-late-cv19 Open-source Model - Supports Precise Speech-to-Text Conversion in Latvian

Whisper Large V3 Lv Late Cv19

Developed by AiLab-IMCS-UL

A Latvian automatic speech recognition model fine-tuned based on whisper-large-v3, trained by AiLab.lv, supporting Latvian speech-to-text tasks.

Speech Recognition

Safetensors

OtherOpen Source License:Apache-2.0 #Latvian speech recognition #Multi-domain adaptation #Low word error rate

Downloads 162

Release Time : 10/15/2024

Model Overview

This model is a Latvian automatic speech recognition (ASR) model fine-tuned on the OpenAI whisper-large-v3 architecture, specifically optimized for Latvian language to accurately convert Latvian audio into text.

Model Features

Multi-dataset Training

Trained on a combination of Common Voice 19.0 and LATE-Media 2.0 datasets, totaling 282.4 hours of training data.

Multiple Quantized Versions

Provides GGML format 4-bit, 5-bit, and 8-bit quantized versions for whisper.cpp, as well as an 8-bit quantized version for CTranslate2.

Low Word Error Rate

Achieves a word error rate (WER) of 3.2% on the Common Voice 19.0 test set, demonstrating excellent performance.

Model Capabilities

Latvian speech recognition

Audio-to-text conversion

Speech transcription

Use Cases

Speech Transcription

Broadcast Content Transcription

Automatically transcribe Latvian broadcast content into text

Achieves a 12.8% word error rate on the LATE-Media test set

General Speech Transcription

Transcription of everyday Latvian speech

Achieves a 3.2% word error rate on the Common Voice test set

🚀 General-purpose Latvian ASR model

A fine - tuned model for Latvian automatic speech recognition, leveraging the power of whisper - large - v3.

This is a fine - tuned [whisper - large - v3](https://huggingface.co/openai/whisper - large - v3) model for Latvian, trained by AiLab.lv using two general - purpose speech datasets: the Latvian part of Common Voice 19.0, and the latest version of the Latvian broadcast dataset [LATE - Media](https://korpuss.lv/id/LATE - mediji).

This version of the model supersedes the previous [whisper - large - v3 - lv - late - cv17](https://huggingface.co/AiLab - IMCS - UL/whisper - large - v3 - lv - late - cv17) model.

We also provide 4 - bit, 5 - bit and 8 - bit quantized versions of the model in the GGML format for the use with whisper.cpp, as well as an 8 - bit quantized version for the use with CTranslate2.

✨ Features

Fine - tuned for Latvian language on multiple datasets.
Supersedes the previous version of the model.
Provides quantized versions for different use cases.

📚 Documentation

Training

Fine - tuning was done using the Hugging Face Transformers library with a modified [seq2seq script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech - recognition#sequence - to - sequence).

Property	Details
Training data	Latvian Common Voice 19.0 train set (the [VW split](https://analyzer.cv - toolbox.web.tr)) and LATE - Media 2.0 train set
Total training hours	282.4

Training data	Hours
Latvian Common Voice 19.0 train set (the [VW split](https://analyzer.cv - toolbox.web.tr))	212.6
LATE - Media 2.0 train set	69.8
Total	282.4

Evaluation

Property	Details
Testing data	Latvian Common Voice 19.0 test set (VW) and LATE - Media 1.0 test set
Evaluation metrics	Word Error Rate (WER) and Character Error Rate (CER)

Testing data	WER	CER
Latvian Common Voice 19.0 test set (VW) - formatted	4.8	1.6
Latvian Common Voice 19.0 test set (VW) - normalized	3.2	1.0
LATE - Media 1.0 test set - formatted	19.2	7.6
LATE - Media 1.0 test set - normalized	12.8	5.3

The Latvian CV 19.0 test set is available [here](https://analyzer.cv - toolbox.web.tr). The LATE - Media 1.0 test set is available here.

Citation

Please cite this paper if you use this model in your research:

@inproceedings{dargis-etal-2024-balsutalka-lv,
  author = {Dargis, Roberts and Znotins, Arturs and Auzina, Ilze and Saulite, Baiba and Reinsone, Sanita and Dejus, Raivis and Klavinska, Antra and Gruzitis, Normunds},
  title = {{BalsuTalka.lv - Boosting the Common Voice Corpus for Low - Resource Languages}},
  booktitle = {Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC - COLING)},
  publisher = {ELRA and ICCL},
  year = {2024},
  pages = {2080--2085},
  url = {https://aclanthology.org/2024.lrec - main.187}
}

Acknowledgements

This work was supported by the EU Recovery and Resilience Facility project Language Technology Initiative (2.3.1.1.i.0/1/22/I/CFLA/002) in synergy with the State Research Programme project [LATE](https://www.digitalhumanities.lv/projekti/vpp - late/) (VPP - LETONIKA - 2021/1 - 0006).

📄 License

This project is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご