đ Whisper Medium EN Fine-Tuned for Air Traffic Control (ATC)
This model is a fine - tuned version of OpenAI's Whisper Medium EN model, specifically tailored for Air Traffic Control (ATC) communication. It offers high - precision transcription for aviation - related communications, significantly enhancing the accuracy compared to the original model.
đ Quick Start
You can access the fine - tuned model on Hugging Face:
You can also test the model online using the ATC Transcription Assistant, which allows you to upload audio files and generate transcriptions.
⨠Features
- Enhanced Transcription Accuracy: The fine - tuning process reduces the Word Error Rate (WER) by 84% on domain - specific aviation communications compared to the original pretrained model.
- Handling Variations: It is particularly effective at dealing with accent variations and ambiguous phrasing commonly found in ATC communications.
- Optimized for ATC: Specifically designed to handle short, distinct transmissions between pilots and air traffic controllers.
đ Documentation
Model Overview
This model is a fine - tuned version of OpenAI's Whisper Medium EN model, trained on Air Traffic Control (ATC) communication datasets. The fine - tuning brings remarkable improvements in transcription accuracy for aviation - specific communications.
Property |
Details |
Base Model |
OpenAI Whisper Medium EN |
Fine - tuned Model WER |
15.08% |
Pretrained Model WER |
94.59% |
Relative Improvement |
84.06% |
Model Description
Whisper Medium EN fine - tuned for ATC is optimized for short transmissions between pilots and controllers. It is fine - tuned using data from:
The ATC Dataset combines and cleans data from these two sources, filtering and refining to enhance transcription accuracy for ATC communications.
Intended Use
- Transcribing aviation communication: Provide accurate transcriptions for ATC communications, accounting for accents and English phrasing variations.
- Air Traffic Control Systems: Assist in real - time transcription of pilot - ATC conversations to improve situational awareness.
- Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new aviation - safety tools.
Training Procedure
- Hardware: Fine - tuning was conducted on two A100 GPUs with 80GB memory.
- Epochs: 10
- Learning Rate: 1e - 5
- Batch Size: 32 (effective batch size with gradient accumulation)
- Augmentation: Dynamic data augmentation techniques (Gaussian noise, pitch shifting, etc.) were applied during training.
- Evaluation Metric: Word Error Rate (WER)
Limitations
While the fine - tuned model performs well in ATC - specific communications, it may not generalize effectively to other speech domains. Also, like most speech - to - text models, transcription accuracy can be affected by extremely poor - quality audio or heavily accented speech not encountered during training.
đ§ Technical Details
The model - index information is as follows:
- Name: Whisper Medium EN Fine - Tuned for ATC
- Results:
- Task:
- Type: automatic - speech - recognition
- Dataset:
- Name: ATC Dataset
- Type: jacktol/atc - dataset
- Metrics:
- Name: Word Error Rate (WER)
- Type: wer
- Value: 15.08
- Source:
- Name: ATC Transcription Evaluation
- URL: https://jacktol.net/posts/fine - tuning_whisper_for_atc/
đ License
This model is released under the MIT license.
References
- Blog Post: [Fine - Tuning Whisper for ATC: 84% Improvement in Transcription Accuracy](https://jacktol.net/posts/fine - tuning_whisper_for_atc/)
- GitHub Repository: [Fine - Tuning Whisper on ATC Data](https://github.com/jack - tol/fine - tuning - whisper - on - atc - data/tree/main)