Whisper-medium.en Fine-tuned for ATC Open-source Speech Recognition Model - Optimizing Air Traffic Control Communication, Reducing Word Error Rate by 84%

Whisper Medium.en Fine Tuned For ATC

Developed by jacktol

Fine-tuned based on the OpenAI Whisper Medium EN model, specifically optimized for speech recognition of air traffic control communications, with an 84% reduction in word error rate

Speech Recognition

Safetensors

EnglishOpen Source License:MIT #Aviation communication transcription #High-precision ATC recognition #Noise robustness

Downloads 2,525

Release Time : 10/3/2024

Model Overview

A speech recognition model optimized for communications between pilots and air traffic controllers, capable of effectively handling accent variations and ambiguous expressions

Model Features

Domain optimization

Fine-tuned for ATC communication data, significantly improving transcription accuracy in specific domains

High accuracy

The word error rate (WER) has decreased from 94.59% to 15.08%, a relative improvement of 84.06%

Strong adaptability

Capable of effectively handling various accents, non-standard terms, and noisy/distorted communication content

Model Capabilities

Aviation communication speech transcription

Real-time dialogue recognition

Speech processing in noisy environments

Use Cases

Aviation operations

Real-time communication transcription

Assist the air traffic control system in real-time transcription of conversations between pilots and ATC

Improve situational awareness

Training and research

Aviation safety research

Provide accurate communication transcription data for researchers

Support the development of aviation safety tools

🚀 Whisper Medium EN Fine-Tuned for Air Traffic Control (ATC)

This model is a fine - tuned version of OpenAI's Whisper Medium EN model, specifically tailored for Air Traffic Control (ATC) communication. It offers high - precision transcription for aviation - related communications, significantly enhancing the accuracy compared to the original model.

🚀 Quick Start

You can access the fine - tuned model on Hugging Face:

You can also test the model online using the ATC Transcription Assistant, which allows you to upload audio files and generate transcriptions.

✨ Features

Enhanced Transcription Accuracy: The fine - tuning process reduces the Word Error Rate (WER) by 84% on domain - specific aviation communications compared to the original pretrained model.
Handling Variations: It is particularly effective at dealing with accent variations and ambiguous phrasing commonly found in ATC communications.
Optimized for ATC: Specifically designed to handle short, distinct transmissions between pilots and air traffic controllers.

📚 Documentation

Model Overview

This model is a fine - tuned version of OpenAI's Whisper Medium EN model, trained on Air Traffic Control (ATC) communication datasets. The fine - tuning brings remarkable improvements in transcription accuracy for aviation - specific communications.

Property	Details
Base Model	OpenAI Whisper Medium EN
Fine - tuned Model WER	15.08%
Pretrained Model WER	94.59%
Relative Improvement	84.06%

Model Description

Whisper Medium EN fine - tuned for ATC is optimized for short transmissions between pilots and controllers. It is fine - tuned using data from:

ATCO2 corpus (1 - hour test subset)
UWB - ATCC corpus

The ATC Dataset combines and cleans data from these two sources, filtering and refining to enhance transcription accuracy for ATC communications.

Intended Use

Transcribing aviation communication: Provide accurate transcriptions for ATC communications, accounting for accents and English phrasing variations.
Air Traffic Control Systems: Assist in real - time transcription of pilot - ATC conversations to improve situational awareness.
Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new aviation - safety tools.

Training Procedure

Hardware: Fine - tuning was conducted on two A100 GPUs with 80GB memory.
Epochs: 10
Learning Rate: 1e - 5
Batch Size: 32 (effective batch size with gradient accumulation)
Augmentation: Dynamic data augmentation techniques (Gaussian noise, pitch shifting, etc.) were applied during training.
Evaluation Metric: Word Error Rate (WER)

Limitations

While the fine - tuned model performs well in ATC - specific communications, it may not generalize effectively to other speech domains. Also, like most speech - to - text models, transcription accuracy can be affected by extremely poor - quality audio or heavily accented speech not encountered during training.

🔧 Technical Details

The model - index information is as follows:

Name: Whisper Medium EN Fine - Tuned for ATC
Results:
- Task:
  - Type: automatic - speech - recognition
- Dataset:
  - Name: ATC Dataset
  - Type: jacktol/atc - dataset
- Metrics:
  - Name: Word Error Rate (WER)
  - Type: wer
  - Value: 15.08
- Source:
  - Name: ATC Transcription Evaluation
  - URL: https://jacktol.net/posts/fine - tuning_whisper_for_atc/

📄 License

This model is released under the MIT license.

References

Blog Post: [Fine - Tuning Whisper for ATC: 84% Improvement in Transcription Accuracy](https://jacktol.net/posts/fine - tuning_whisper_for_atc/)
GitHub Repository: [Fine - Tuning Whisper on ATC Data](https://github.com/jack - tol/fine - tuning - whisper - on - atc - data/tree/main)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご