đ Whisper Tiny German
This is a German speech recognition model based on the whisper-tiny model, designed for edge cases where model size is a concern.
đ Quick Start
The following is an example of how to use the whisper-tiny-german
model for automatic speech recognition:
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "primeline/whisper-tiny-german"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])
⨠Features
- Compact Size: The model weights have 37.8M parameters and are only 73MB in bfloat16 format, suitable for edge devices with limited resources.
- German Speech Recognition: Specifically designed for German speech recognition tasks.
đ Documentation
Intended uses & limitations
The model is intended for German speech recognition tasks, especially in edge cases where model size is a concern. However, it's not recommended for critical use cases as it may not perform well in all scenarios due to its small size.
Dataset
The dataset used for training is a filtered subset of the Common Voice dataset, multilingual librispeech, and some internal data. The data was carefully filtered and double - checked for quality and correctness, and text data normalization was performed, especially for casing and punctuation.
Model family
Property |
Details |
Model Type |
German Speech Recognition Model |
Training Data |
Filtered subset of Common Voice dataset, multilingual librispeech, and some internal data |
Model |
Parameters |
link |
Whisper large v3 german |
1.54B |
link |
Whisper large v3 turbo german |
809M |
link |
Distil-whisper large v3 german |
756M |
link |
tiny whisper |
37.8M |
link |
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e - 05
- total_train_batch_size: 512
- num_epochs: 5.0
Framework versions
- Transformers 4.39.3
- Pytorch 2.3.0a0+ebedce2
- Datasets 2.18.0
- Tokenizers 0.15.2
đ License
This model is published under the Apache-2.0 license.

Your partner for AI infrastructure in Germany. Experience the powerful AI infrastructure that drives your ambitions in Deep Learning, Machine Learning & High - Performance Computing. Optimized for AI training and inference.
Model author: Florian Zimmermeister
Disclaimer
This model is not a product of the primeLine Group.
It represents research conducted by [Florian Zimmermeister](https://huggingface.co/flozi00), with computing power sponsored by primeLine.
The model is published under this account by primeLine, but it is not a commercial product of primeLine Solutions GmbH.
Please be aware that while we have tested and developed this model to the best of our abilities, errors may still occur.
Use of this model is at your own risk. We do not accept liability for any incorrect outputs generated by this model.