EraX-WoW-Turbo-V1.1 Open Source Speech Recognition Model - Supports Multiple Languages, Fast and Accurate Vietnamese Recognition

Erax WoW Turbo V1.1

Developed by erax-ai

A Whisper Large-v3 Turbo speech recognition model optimized for Vietnamese, supporting multiple languages with ultra-fast response and high accuracy

Speech Recognition

Transformers

OtherOpen Source License:MIT #Vietnamese optimization #Real-time speech transcription #Multilingual recognition

Downloads 666

Release Time : 3/30/2025

Model Overview

A speech recognition model optimized based on Whisper Large-v3 Turbo, specifically localized for Vietnamese while supporting 11 languages, suitable for various scenarios like real-time transcription

Model Features

Ultra-fast Response

Processes 30 seconds of audio in approximately 350 milliseconds, ideal for real-time transcription

Multilingual Support

Supports 11 languages, including all 8 regional accents of Vietnamese

High Accuracy

Word Error Rate (WER) of about 12% for major languages, capable of recognizing various accents

Large-scale Training

Trained on a dataset of 600,000 samples (approximately 1,000 hours) of real-world audio

Open Source and Free

Released under MIT license with no usage restrictions

Model Capabilities

Speech recognition

Real-time transcription

Multilingual processing

Accent recognition

Use Cases

Real-time Transcription

Meeting Minutes

Real-time transcription of meeting content

Almost real-time text generation

Live Captioning

Generating instant subtitles for live events

Low-latency subtitle output

Voice Assistants

Voice-controlled Applications

Developing responsive voice control interfaces

High-accuracy voice command recognition

Accessibility Tools

Hearing Assistance

Providing speech-to-text services for the hearing impaired

Real-time speech-to-text conversion

🚀 EraX-WoW-Turbo V1.1: Whisper Large-v3 Turbo for Vietnamese and then some, Supercharged and Localized!

EraX-WoW-Turbo is a speech recognition model built upon the impressive Whisper Large-v3 Turbo. It offers lightning - fast and highly accurate speech recognition, suitable for various applications.

🚀 Quick Start

Get ready to experience the power of EraX - WoW - Turbo! You can download the model and start testing it right away.

✨ Features

Blazing Fast

With the optimizations in the Turbo architecture and the amazing CTranslate2 library, it can achieve real - time transcription. It can process 30 seconds of audio in about 350ms, leaving the original Medium model far behind.

Multilingual Maestro

EraX - WoW - Turbo is fine - tuned on a diverse dataset covering 11 key languages:

Vietnamese (covering all 8 regions with accents)
English (US)
Chinese (Mandarin)
Cantonese
Indonesian
Korean
Japanese
Russian
German
French
Dutch

Accuracy You Can Trust

Although the benchmark results are still being finalized, preliminary tests show an impressive Word Error Rate (WER) of around 12% across major languages, including challenging Vietnamese dialects.

Trained with Care

The model is trained on a substantial dataset of about 600,000 samples (roughly 1000 hours), which can handle real - world audio conditions, such as noise.

Open Source (MIT License)

It follows the MIT License, allowing you to use it freely without restrictions.

Try it

You can try the model with the following audio sample:

"Chị Lan Anh ơi, em xin lỗi vì sự cố mất sóng vừa rồi. Em đã ghi nhận được hầu hết thông tin rồi ạ. Bây giờ em muốn hỏi chị là hiện tại xe của chị đang ở đâu ạ? Xe vẫn còn ở hiện trường hay đã được di chuyển đến gara hay nơi nào khác?"

🔧 Technical Details

Turbocharging Performance (CTranslate2)

You can unlock even more speed by using EraX - WoW - Turbo with the CTranslate2 library (https://github.com/OpenNMT/CTranslate2), which can potentially achieve a 2.5x speedup. This makes it ideal for applications requiring the lowest latency.

📚 Documentation

Use Cases

Real - time Transcription: Suitable for live captioning, meetings, interviews, etc.
Voice Assistants: Build responsive and accurate voice - controlled applications.
Media Subtitling: Generate subtitles for videos and podcasts quickly and accurately.
Accessibility Tools: Empower individuals with hearing impairments.
Language Learning: Practice pronunciation and receive instant feedback.
Combined with EraX Translator: Combine it with the upcoming EraX translator (around 100ms/sentence latency) for a complete multilingual communication solution, such as instant translation for international conferences or travel apps.

Limitations

This model is trained on adult speech and might struggle with the high - pitched cries of infants or very quiet, hushed whispers.

🤝 Get Involved!

We encourage you to:

Try it out: Download the model and test it.
Provide feedback: Let us know what works, what doesn't, and what features you'd like to see.
Contribute: If you're a developer, consider contributing to the project.

The EraX Team is committed to continuously improving our models. Stay tuned for future updates!

📄 License

MIT follows Whisper's license.

📝 Citation

If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:

@article{title={EraX-WoW-Turbo-V1.1: Lắng nghe để Yêu thương.},
  author={Nguyễn Anh Nguyên - Phạm Huỳnh Nhật - Cty Bảo hiểm AAA (504h)},
  organization={EraX},
  year={2025},
  url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.1}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご