Whisper-tiny-vi Open-source Vietnamese Speech Recognition Model - Free Deployment for Accurate Vietnamese Speech Recognition

Whisper Tiny Vi

Developed by doof-ferb

Vietnamese automatic speech recognition (ASR) model fine-tuned based on OpenAI Whisper-tiny architecture, demonstrating excellent performance on multiple Vietnamese datasets

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Vietnamese speech recognition #Whisper fine-tuning #Low-resource optimization

Downloads 44

Release Time : 2/20/2024

Model Overview

This model is optimized for Vietnamese speech recognition, significantly improving the accuracy of the original Whisper-tiny model in Vietnamese recognition through extensive fine-tuning with Vietnamese speech data

Model Features

Vietnamese optimization

Specifically fine-tuned for Vietnamese speech characteristics, significantly reducing WER compared to the original model

Multi-dataset training

Trained using 10 different Vietnamese speech datasets, covering various speech scenarios

Lightweight

Based on Whisper-tiny architecture, suitable for deployment in resource-constrained environments

Model Capabilities

Vietnamese speech-to-text

Long audio transcription

Real-time speech recognition

Use Cases

Speech transcription

Vietnamese video subtitle generation

Automatically generate subtitles for Vietnamese video content

Achieved only 18.7% WER on VIVOS test set

Voice assistant

Building Vietnamese voice interaction systems

26.6% WER on Common Voice test set

Education

Language learning tool

Helping learners practice Vietnamese pronunciation and listening

🚀 doof-ferb/whisper-tiny-vi

This is a fine - tuned Whisper Tiny model on a large collection of Vietnamese speech datasets, aiming to enhance the performance of automatic speech recognition in Vietnamese.

🚀 Quick Start

Prerequisites

Make sure you have torch and transformers installed.

Example Code

import torch
from transformers import pipeline

PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
PIPE_KWARGS = {"language": "vi", "task": "transcribe"}

PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]

✨ Features

Fine - tuned on a large collection of Vietnamese speech datasets to improve the performance of automatic speech recognition in Vietnamese.
The model has been trained for 21k steps with a 5% warm - up and a batch size of 16×2 on Kaggle free T4×2.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from transformers import pipeline

PIPE = pipeline(task="automatic-speech-recognition", model="doof-ferb/whisper-tiny-vi", device="cuda:0", torch_dtype=torch.float16)
PIPE_KWARGS = {"language": "vi", "task": "transcribe"}

PIPE("audio.mp3", generate_kwargs=PIPE_KWARGS)["text"]

📚 Documentation

Datasets

The model is trained on the following datasets:

doof-ferb/vlsp2020_vinai_100h
doof-ferb/fpt_fosd
doof-ferb/infore1_25hours
doof-ferb/infore2_audiobooks
quocanh34/viet_vlsp
linhtran92/final_dataset_500hrs_wer0
linhtran92/viet_youtube_asr_corpus_v2
google/fleurs
mozilla-foundation/common_voice_16_1
vivos

Evaluation Results

Dataset	WER
Mozilla CommonVoice (Vietnamese) v16.1	26.6%
Google FLEURS (Vietnamese)	37.1%
ĐHQG TPHCM VIVOS	18.7%

TODO List

[x] Training then publish checkpoint
[x] Evaluate WER on Common Voice & FLEURS & VIVOS
[ ] Convert to openai-whisper, whisper.cpp, faster-whisper
[ ] Convert to ONNX: to try https://github.com/k2-fsa/sherpa-onnx & https://github.com/zhuzilin/whisper-openvino
[ ] Convert to TensorRT: https://github.com/openai/whisper/discussions/169

🔧 Technical Details

The model is based on openai/whisper-tiny and fine - tuned on Vietnamese speech datasets. The training process involves 21k steps with a 5% warm - up and a batch size of 16×2 on Kaggle free T4×2.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご