distil-whisper-large-v3-german Open Source German Speech Recognition Model - Maintain High Quality and Enable Fast Inference

Distil Whisper Large V3 German

Developed by primeline

A German speech recognition model based on distil-whisper technology, with 756 million parameters, achieving faster inference speeds while maintaining high quality.

Speech Recognition

Transformers

GermanOpen Source License:Apache-2.0 #German Speech Recognition #Distilled Model #Low-Latency Inference

Downloads 207

Release Time : 4/15/2024

Model Overview

A distilled model specifically designed for German speech recognition tasks, suitable for local transcription services or integration into complex speech processing pipelines.

Model Features

Efficient Inference

With only half the parameters of the original large model, it maintains excellent recognition quality, making it suitable for real-time applications.

Optimized Compatibility

Can be used with optimization toolkits like TensorRT to significantly reduce latency.

Data Quality

Training data undergoes rigorous filtering and text normalization to ensure model input consistency.

Model Capabilities

German speech-to-text

Long audio processing

Timestamped transcription

Use Cases

Speech Transcription Services

Localized Transcription

Deployed as a local German speech transcription service

High-accuracy real-time transcription output

Speech Processing Pipelines

Speech Analysis Integration

Serves as the recognition component in complex speech processing systems

Efficient processing of German speech input

🚀 distil-whisper-german

This model is a German Speech Recognition model based on the distil-whisper technique. It has 756M parameters and a size of 1.51GB in bfloat16 format. As a follow - up to the Whisper large v3 german, we created a distilled version for faster inference with minimal quality loss.

🚀 Quick Start

The model is intended to be used for German speech recognition tasks. It can serve as a local transcription service or be integrated into a larger speech - recognition pipeline. With only half the parameters of the large model, it still offers good quality for most tasks. When using optimization toolkits like tensorrt, its low latency makes it suitable for real - time applications.

✨ Features

Fast Inference: A distilled version for quicker results with minimal quality loss.
Good Quality: Despite having fewer parameters, it maintains high - quality performance for most German speech recognition tasks.
Low Latency: Suitable for real - time applications when optimized.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "primeline/distil-whisper-large-v3-german"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])

📚 Documentation

Dataset

The dataset used for training is a filtered subset of the Common Voice dataset, multilingual librispeech, and some internal data. The data was carefully filtered and double - checked for quality and correctness. Text data normalization was performed, especially for casing and punctuation.

Model family

Property	Details
Model Type	German Speech Recognition
Training Data	A filtered subset of the Common Voice dataset, multilingual librispeech, and some internal data

Model	Parameters	link
Whisper large v3 german	1.54B	link
Whisper large v3 turbo german	809M	link
Distil - whisper large v3 german	756M	link
tiny whisper	37.8M	link

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e - 05
total_train_batch_size: 512
num_epochs: 5.0

Framework versions

Transformers 4.39.3
Pytorch 2.3.0a0+ebedce2
Datasets 2.18.0
Tokenizers 0.15.2

🔧 Technical Details

The model is a distilled version of the German speech - recognition model, aiming to achieve faster inference with minimal quality loss. It uses a filtered and high - quality dataset for training and specific hyperparameters to optimize performance.

📄 License

This model is published under the Apache 2.0 license.

About us

Your partner for AI infrastructure in Germany. Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High - Performance Computing. Optimized for AI training and inference.

Model author: Florian Zimmermeister

⚠️ Important Note

This model is not a product of the primeLine Group. It represents research conducted by Florian Zimmermeister, with computing power sponsored by primeLine. The model is published under this account by primeLine, but it is not a commercial product of primeLine Solutions GmbH. Please be aware that while we have tested and developed this model to the best of our abilities, errors may still occur. Use of this model is at your own risk. We do not accept liability for any incorrect outputs generated by this model.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご