EraX-WoW-Turbo-V1.1-CT2 Open Source Model - Multilingual Speech Recognition with Enhanced Vietnamese, Fast and Accurate!

Erax WoW Turbo V1.1 CT2

Developed by erax-ai

A localized Vietnamese-enhanced version of Whisper Large-v3 Turbo optimized with CTranslate2, supporting multilingual speech recognition with high speed and accuracy

Speech Recognition

Transformers

OtherOpen Source License:MIT #Vietnamese enhancement #Real-time speech transcription #Multilingual ASR

Downloads 1,283

Release Time : 3/31/2025

Model Overview

This is an optimized speech-to-text model based on the Whisper Large-v3 Turbo architecture, specially enhanced for Vietnamese while supporting multiple languages. The model is optimized with CTranslate2, providing ultra-fast transcription capabilities.

Model Features

Ultra-fast transcription

Processes 30 seconds of audio in approximately 350ms, supporting real-time transcription

Multilingual support

Supports 11 languages, with special optimization for 8 Vietnamese regional accents

High accuracy

Achieves a word error rate (WER) of about 12% for major languages, capable of handling various accents

CTranslate2 optimization

Achieves 2.5x speedup through CTranslate2 library, suitable for low-latency applications

Model Capabilities

Speech-to-text

Multilingual recognition

Real-time transcription

Accent adaptation

Use Cases

Real-time transcription

Meeting minutes

Real-time transcription of meeting content

Near real-time text records

Interview records

Automatically transcribe interview audio

Fast and accurate interview records

Accessibility tools

Hearing assistance

Provides real-time captions for hearing-impaired individuals

Improved communication accessibility

Media production

Video subtitles

Automatically generate subtitles for videos

Fast and accurate subtitle generation

🚀 EraX-WoW-Turbo V1.1-CT2: Whisper Large-v3 Turbo with CTranslate2 for Vietnamese and then some, Supercharged and Localized!

EraX-WoW-Turbo V1.1-CT2 is a powerful speech recognition model. Built upon the Whisper Large-v3 Turbo, it offers real - time transcription, multilingual support, high accuracy, and is trained on a large dataset. It's open - source under the MIT License, providing a great solution for various speech - related applications.

🚀 Quick Start

To start using EraX-WoW-Turbo V1.1-CT2, you need to install the necessary packages and run the provided Python code.

Installation

Install the following packages:

pip install pydub
pip install silero-vad
pip install faster-whisper
pip install ctranslate2

Python Code Example

from faster_whisper import WhisperModel

model_path = "erax-ai/EraX-WoW-Turbo-V1.1-CT2"

# Convert audio into MONO & 16000 nếu cần thiết
from pydub import AudioSegment
def convert16k(audio_path):
    audio = AudioSegment.from_file(audio_path, format="wav")    
    audio = audio.split_to_mono()[0]
    audio = audio.set_frame_rate(16000)

    audio.export("test.wav", format="wav")
    return True
    
# Run on GPU with FP16
fast_model = WhisperModel(model_path, device="cuda", compute_type="bfloat16", )

segments, info = fast_model.transcribe(test["path"], beam_size=5,
                                  #word_timestamps=True,
                                  language="vi",
                                  temperature=0.0,
                                  vad_filter=True,
                                  #vad_parameters=dict(min_silence_duration_ms=2000),
                                  )

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

✨ Features

Blazing Fast

With the CTranslate2 library, EraX-WoW-Turbo can achieve real - time transcription. It can process 30 seconds of audio in about 350ms, much faster than the original Medium model.

Multilingual Maestro

The model is fine - tuned on a diverse dataset covering 11 key languages, including Vietnamese, English (US), Chinese (Mandarin), Cantonese, Indonesian, Korean, Japanese, Russian, German, French, and Dutch.

Accuracy You Can Trust

Preliminary tests show an impressive WER (Word Error Rate) around 12% across major languages, including challenging Vietnamese dialects.

Trained with Care

It was trained on a substantial dataset (600,000 samples, roughly 1000 hours), covering real - world audio conditions, so it can handle noise well.

Open Source (MIT License)

The model is open - source under the MIT License, allowing users to do whatever they want without restrictions.

Try it

You can try the model with the following audio sample:

"Chị Lan Anh ơi, em xin lỗi vì sự cố mất sóng vừa rồi. Em đã ghi nhận được hầu hết thông tin rồi ạ. Bây giờ em muốn hỏi chị là hiện tại xe của chị đang ở đâu ạ? Xe vẫn còn ở hiện trường hay đã được di chuyển đến gara hay nơi nào khác?"

📦 Installation

To use EraX-WoW-Turbo V1.1-CT2, you need to install the following packages:

pip install pydub
pip install silero-vad
pip install faster-whisper
pip install ctranslate2

💻 Usage Examples

Basic Usage

The basic usage is shown in the code example above. You can install the necessary packages and run the Python code to perform speech recognition.

Advanced Usage

You can further optimize the performance by using the CTranslate2 library. It can potentially provide a 2.5x speedup, making it ideal for applications requiring the absolute lowest latency.

📚 Documentation

Use Cases

Real - time Transcription: Suitable for live captioning, meetings, interviews, etc.
Voice Assistants: Build responsive and accurate voice - controlled applications.
Media Subtitling: Generate subtitles for videos and podcasts quickly and accurately.
Accessibility Tools: Empower individuals with hearing impairments.
Language Learning: Practice pronunciation and receive instant feedback.
Multilingual Communication: Combine it with the upcoming EraX translator for a complete multilingual communication solution, such as for international conferences or travel apps.

Limitations

This model is trained on adult speech and might struggle with the high - pitched cries of infants or very quiet, hushed whispers.

Get Involved

Try it out: Download the model and test it.
Provide feedback: Let the developers know what works, what doesn't, and what features you'd like to see.
Contribute: If you're a developer, consider contributing to the project.

📄 License

This project follows the MIT license, just like Whisper.

📝 Citation

If you find our project useful, we would appreciate it if you could star our repository and cite our work as follows:

@article{title={EraX-WoW-Turbo-V1.1-CT2: Lắng nghe để Yêu thương.},
  author={Nguyễn Anh Nguyên - Phạm Huỳnh Nhật - Cty Bảo hiểm AAA (504h)},
  organization={EraX},
  year={2025},
  url={https://huggingface.co/erax-ai/EraX-WoW-Turbo-V1.1-CT2}
}

Logo

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご