A speech recognition model optimized based on Whisper Large-v3 Turbo, specifically localized for Vietnamese while supporting 11 languages, suitable for various scenarios like real-time transcription
Model Features
Ultra-fast Response
Processes 30 seconds of audio in approximately 350 milliseconds, ideal for real-time transcription
Multilingual Support
Supports 11 languages, including all 8 regional accents of Vietnamese
High Accuracy
Word Error Rate (WER) of about 12% for major languages, capable of recognizing various accents
Large-scale Training
Trained on a dataset of 600,000 samples (approximately 1,000 hours) of real-world audio
Open Source and Free
Released under MIT license with no usage restrictions
Model Capabilities
Speech recognition
Real-time transcription
Multilingual processing
Accent recognition
Use Cases
Real-time Transcription
Meeting Minutes
Real-time transcription of meeting content
Almost real-time text generation
Live Captioning
Generating instant subtitles for live events
Low-latency subtitle output
Voice Assistants
Voice-controlled Applications
Developing responsive voice control interfaces
High-accuracy voice command recognition
Accessibility Tools
Hearing Assistance
Providing speech-to-text services for the hearing impaired
Real-time speech-to-text conversion
🚀 EraX-WoW-Turbo V1.1: Whisper Large-v3 Turbo for Vietnamese and then some, Supercharged and Localized!
EraX-WoW-Turbo is a speech recognition model built upon the impressive Whisper Large-v3 Turbo. It offers lightning - fast and highly accurate speech recognition, suitable for various applications.
🚀 Quick Start
Get ready to experience the power of EraX - WoW - Turbo! You can download the model and start testing it right away.
✨ Features
Blazing Fast
With the optimizations in the Turbo architecture and the amazing CTranslate2 library, it can achieve real - time transcription. It can process 30 seconds of audio in about 350ms, leaving the original Medium model far behind.
Multilingual Maestro
EraX - WoW - Turbo is fine - tuned on a diverse dataset covering 11 key languages:
Vietnamese (covering all 8 regions with accents)
English (US)
Chinese (Mandarin)
Cantonese
Indonesian
Korean
Japanese
Russian
German
French
Dutch
Accuracy You Can Trust
Although the benchmark results are still being finalized, preliminary tests show an impressive Word Error Rate (WER) of around 12% across major languages, including challenging Vietnamese dialects.
Trained with Care
The model is trained on a substantial dataset of about 600,000 samples (roughly 1000 hours), which can handle real - world audio conditions, such as noise.
Open Source (MIT License)
It follows the MIT License, allowing you to use it freely without restrictions.
Try it
You can try the model with the following audio sample:
"Chị Lan Anh ơi, em xin lỗi vì sự cố mất sóng vừa rồi. Em đã ghi nhận được hầu hết thông tin rồi ạ. Bây giờ em muốn hỏi chị là hiện tại xe của chị đang ở đâu ạ? Xe vẫn còn ở hiện trường hay đã được di chuyển đến gara hay nơi nào khác?"
🔧 Technical Details
Turbocharging Performance (CTranslate2)
You can unlock even more speed by using EraX - WoW - Turbo with the CTranslate2 library (https://github.com/OpenNMT/CTranslate2), which can potentially achieve a 2.5x speedup. This makes it ideal for applications requiring the lowest latency.
📚 Documentation
Use Cases
Real - time Transcription: Suitable for live captioning, meetings, interviews, etc.
Voice Assistants: Build responsive and accurate voice - controlled applications.
Media Subtitling: Generate subtitles for videos and podcasts quickly and accurately.
Accessibility Tools: Empower individuals with hearing impairments.
Language Learning: Practice pronunciation and receive instant feedback.
Combined with EraX Translator: Combine it with the upcoming EraX translator (around 100ms/sentence latency) for a complete multilingual communication solution, such as instant translation for international conferences or travel apps.
Limitations
This model is trained on adult speech and might struggle with the high - pitched cries of infants or very quiet, hushed whispers.
🤝 Get Involved!
We encourage you to:
Try it out: Download the model and test it.
Provide feedback: Let us know what works, what doesn't, and what features you'd like to see.
Contribute: If you're a developer, consider contributing to the project.
The EraX Team is committed to continuously improving our models. Stay tuned for future updates!
📄 License
MIT follows Whisper's license.
📝 Citation
If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:
@article{title={EraX-WoW-Turbo-V1.1: Lắng nghe để Yêu thương.},
author={Nguyễn Anh Nguyên - Phạm Huỳnh Nhật - Cty Bảo hiểm AAA (504h)},
organization={EraX},
year={2025},
url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.1}
}