Distil-large-v3.5-ONNX Open-source Speech Recognition Model - A Free Option with Superior Performance, High Efficiency and Practicality

Distil Large V3.5 ONNX

Developed by distil-whisper

Distil-Whisper is a knowledge-distilled version of OpenAI Whisper-Large-v3, offering superior performance and efficiency.

Speech Recognition

Transformers

EnglishOpen Source License:MIT #Efficient Speech Recognition #Real-time Transcription Optimization #Speculative Decoding Acceleration

Downloads 25

Release Time : 3/25/2025

Model Overview

Distil-Large-v3.5 is a distilled version of Whisper-Large-v3, trained using large-scale pseudo-labeling and strong data augmentation techniques, providing better robustness and accuracy while maintaining efficiency.

Model Features

Efficiency

Approximately 1.5 times faster than Whisper-Large-v3-Turbo while maintaining similar accuracy.

Robustness

Enhanced model robustness through large-scale pseudo-labeling and strong data augmentation techniques.

Speculative Decoding

Excels as a draft model for Whisper-Large-v3, achieving approximately 2x speedup.

Model Capabilities

English speech recognition

Long speech transcription

Short speech transcription

Use Cases

Speech Transcription

Meeting Minutes

Convert meeting recordings into text transcripts.

Highly accurate transcription results.

Voice Assistants

Used as the speech recognition module for voice assistants.

Fast response and high accuracy.

🚀 Distil-Whisper: Distil-Large-v3.5

Distil-Whisper is a knowledge-distilled version of OpenAI's Whisper-Large-v3, offering high efficiency and better performance, suitable for automatic speech recognition.

Distil-Whisper is the knowledge-distilled version of OpenAI's Whisper-Large-v3, described in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. As the newest addition to the Distil-Whisper English family, Distil-Large-v3.5 maintains the high efficiency of its predecessors while delivering better performance.

Compared to earlier models, Distil-Large-v3.5 has been trained on over 4× more diverse public data (98k hours) and uses a "patient" teacher with an extended training schedule and aggressive data augmentation (SpecAugment) during distillation. This results in enhanced robustness and accuracy compared to previous Distil-Whisper models, making it suitable as a drop-in replacement.

Model	Params / M	Rel. RTFx	Short-Form OOD WER	Long-Form OOD WER
large-v3-turbo	809	1.0	7.30	10.25
distil-large-v3	756	1.44	7.53	11.6
distil-large-v3.5	756	1.46	7.08	11.39

Why consider Distil-Large-v3.5 when Whisper-Large-v3-Turbo already exists?

It offers a different balance between accuracy and efficiency, remains ~1.5x faster than Whisper-Large-v3-Turbo while performing slightly better on short-form transcription and falling ~1% behind on long-form transcription.
It works perfectly as a draft model for speculative decoding with Whisper-Large-v3. By keeping the encoder frozen during training, we need to load just two extra decoder layers and forward the encoder only once. This achieves ~2x faster inference compared to Whisper-Large-v3 while maintaining identical outputs.

This model is a 🤗 collaborative effort between Bofeng Huang, Eustache Le Bihan, Steven Zheng, Vaibhav Srivastav, and Joshua Lochner.

🚀 Quick Start

Distil-Whisper Distil-Large-v3.5 is a powerful model for automatic speech recognition. You can quickly start using it through the following steps.

📦 Installation

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @huggingface/transformers

💻 Usage Examples

Basic Usage

You can then transcribe audio as follows:

import { pipeline } from '@huggingface/transformers';

const transcriber = await pipeline('automatic-speech-recognition', 'distil-whisper/distil-large-v3.5-ONNX');

const url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
const output = await transcriber(url);
// { text: "And so, my fellow Americans, ask not what your country can do for you. Ask what you can do for your country." }

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご