distil-large-v3-ct2 Open-source Long Text Transcription Model - Fast Inference and High Performance in Word Error Rate

Distil Large V3 Ct2

Developed by distil-whisper

Distil-Whisper is a distilled version of the Whisper model, optimized for long-form transcription, offering faster inference speed and improved word error rate (WER) performance.

Speech Recognition EnglishOpen Source License:MIT #Long-form speech recognition #Efficient inference engine #Low word error rate

Downloads 58

Release Time : 3/21/2024

Model Overview

This model is the distil-large-v3 weights converted to CTranslate2 format, specifically designed to be compatible with OpenAI Whisper's long-form transcription algorithm, achieving an average 5% improvement in word error rate (WER) compared to previous versions.

Model Features

Efficient Inference

Fast inference enabled by the CTranslate2 engine, suitable for real-time speech recognition applications.

Long-form Optimization

Specially designed to be compatible with OpenAI Whisper's long-form transcription algorithm, delivering better performance with long audio files.

Performance Improvement

Compared to the distil-large-v2 version, it achieves an average 5% improvement in word error rate (WER) across 4 out-of-distribution datasets.

Model Capabilities

English speech recognition

Long audio transcription

Real-time speech-to-text

Use Cases

Speech Transcription

Meeting Minutes

Automatically convert meeting recordings into text transcripts.

High accuracy, supports long-duration recordings.

Podcast Transcription

Convert podcast audio content into searchable text.

Excellent performance with long audio files.

🚀 Distil-Whisper: distil-large-v3 for CTranslate2

This repository provides the model weights of distil-large-v3 converted to CTranslate2 format. CTranslate2 is a high - speed inference engine for Transformer models and serves as the supported backend for the Faster-Whisper package.

🚀 Quick Start

This repository contains the model weights for distil-large-v3 converted to CTranslate2 format. CTranslate2 is a fast inference engine for Transformer models and is the supported backend for the Faster-Whisper package.

Compared to previous Distil-Whisper releases, distil-large-v3 is specifically designed to be compatible with the OpenAI Whisper long - form transcription algorithm. In our benchmark over 4 out - of - distribution datasets, distil-large-v3 outperformed distil-large-v2 by 5% WER average. Thus, you can expect significant performance gains by switching to this latest checkpoint.

✨ Features

The model weights of distil-large-v3 are converted to CTranslate2 format, enabling fast inference.
Compatible with the OpenAI Whisper long - form transcription algorithm.
Outperforms distil-large-v2 by 5% WER average in benchmarks over 4 out - of - distribution datasets.

📦 Installation

To use the model in Faster-Whisper, first install the PyPi package according to the official instructions. For this example, we'll also install 🤗 Datasets to load a toy audio dataset from the Hugging Face Hub:

pip install --upgrade pip
pip install --upgrade git+https://github.com/SYSTRAN/faster-whisper datasets[audio]

💻 Usage Examples

Basic Usage

The following code snippet loads the distil-large-v3 model and runs inference on an example file from the LibriSpeech ASR dataset:

import torch
from faster_whisper import WhisperModel
from datasets import load_dataset

# define our torch configuration
device = "cuda:0" if torch.cuda.is_available() else "cpu"
compute_type = "float16" if torch.cuda.is_available() else "float32"

# load model on GPU if available, else cpu
model = WhisperModel("distil-large-v3", device=device, compute_type=compute_type)

# load toy dataset for example
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = dataset[1]["audio"]["path"]

segments, info = model.transcribe(sample, beam_size=1)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Advanced Usage

To transcribe a local audio file, simply pass the path to the audio file as the audio argument to transcribe:

segments, info = model.transcribe("audio.mp3", beam_size=1)

📚 Documentation

For more information about the distil-large-v3 model, refer to the original model card.

📄 License

Distil-Whisper inherits the MIT license from OpenAI's Whisper model.

📚 Citation

If you use this model, please consider citing the Distil-Whisper paper:

@misc{gandhi2023distilwhisper,
      title={Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling}, 
      author={Sanchit Gandhi and Patrick von Platen and Alexander M. Rush},
      year={2023},
      eprint={2311.00430},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご