🚀 Whisper Large v3 Turbo TR - Selim Çavaş
This model is a fine - tuned version of openai/whisper-large-v3-turbo on the Common Voice 17.0 dataset. It can be used for Turkish speech recognition tasks, achieving high accuracy in transcribing Turkish audio.
🚀 Quick Start
Basic Usage
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "selimc/whisper-large-v3-turbo-turkish"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
chunk_length_s=30,
batch_size=16,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
result = pipe("test.mp3")
print(result["text"])
✨ Features
- Versatile Applications: This model can be used in various application areas, including transcription of Turkish language, voice commands, and automatic subtitling for Turkish videos.
- High Accuracy: Achieves a Wer of 18.9229 on the test set of the Common Voice 17.0 dataset.
📦 Installation
The code example above assumes you have installed the necessary libraries. You can install them using the following command:
pip install transformers torch datasets tokenizers
📚 Documentation
Intended uses & limitations
This model can be used in various application areas, including:
- Transcription of Turkish language
- Voice commands
- Automatic subtitling for Turkish videos
How To Use
The code example in the "Quick Start" section demonstrates how to use the model for automatic speech recognition.
Training
Due to colab GPU constraints, only 25% of the Turkish data available in the Common Voice 17.0 dataset was used for training. 😔
Got a GPU to spare? Let's collaborate and take this model to the next level! 🚀
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e - 05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 4000
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
0.1223 |
1.6 |
1000 |
0.3187 |
24.4415 |
0.0501 |
3.2 |
2000 |
0.3123 |
20.9720 |
0.0226 |
4.8 |
3000 |
0.3010 |
19.6183 |
0.001 |
6.4 |
4000 |
0.3123 |
18.9229 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.4.1+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1
📄 License
This model is released under the MIT license.
📦 Model Information
Property |
Details |
Library Name |
transformers |
Base Model |
openai/whisper-large-v3-turbo |
Tags |
generated_from_trainer |
Datasets |
mozilla-foundation/common_voice_17_0 |
Metrics |
wer |
Model Name |
Whisper Large v3 Turbo TR - Selim Çavaş |
Evaluation Results |
Loss: 0.3123, Wer: 18.9229 |