PhoWhisper-medium Open-source Speech Recognition Model - Free Deployment for Accurate Vietnamese Speech Recognition

Phowhisper Medium

Developed by vinai

PhoWhisper is a series of models designed specifically for Vietnamese automatic speech recognition (ASR). It achieves high robustness by fine-tuning the Whisper model on an 844-hour Vietnamese accent dataset.

Speech Recognition

Transformers

OtherOpen Source License:Bsd-3-clause #Vietnamese speech recognition #Multi-accent adaptation #Whisper fine-tuning

Downloads 2,999

Release Time : 2/18/2024

Model Overview

PhoWhisper offers five versions, focusing on the Vietnamese automatic speech recognition task and achieving the current state-of-the-art performance on Vietnamese ASR benchmark datasets.

Model Features

Multi-accent adaptation

Trained on a dataset containing 844 hours of different Vietnamese accents, with strong accent robustness

State-of-the-art performance

Achieves the current state-of-the-art performance on Vietnamese ASR benchmark datasets

Multiple version options

Provides five different versions of the model to meet different needs

Model Capabilities

Vietnamese speech recognition

Multi-accent speech processing

Use Cases

Speech transcription

Vietnamese meeting records

Automatically transcribe Vietnamese meeting recordings into text

Highly accurate transcription results

Media subtitle generation

Automatically generate subtitles for Vietnamese video content

Support subtitle generation for multiple Vietnamese accents

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Phowhisper Medium

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 PhoWhisper: Automatic Speech Recognition for Vietnamese

🚀 Quick Start

📄 License

📚 Documentation