whisper-medium.en-fine-tuned-for-ATC-faster-whisper open-source model - Optimize voice recognition for ATC scenarios, reducing the error rate by 84%

Whisper Medium.en Fine Tuned For ATC Faster Whisper

Developed by jacktol

A speech recognition model fine-tuned based on OpenAI Whisper Medium EN, specifically optimized for Air Traffic Control (ATC) communication scenarios, achieving an 84% reduction in Word Error Rate (WER)

Speech Recognition EnglishOpen Source License:MIT #Aviation Communication Transcription #Low WER Optimization #Faster-Whisper Acceleration

Downloads 120

Release Time : 10/3/2024

Model Overview

A speech-to-text model optimized for aviation control communications, excelling in handling professional terminology, accent variations, and ambiguous expressions, compatible with Faster-Whisper for efficient inference

Model Features

Aviation-Specific Optimization

Fine-tuned for ATC communication scenarios, significantly improving recognition accuracy for aviation terminology and communication patterns

Significant Performance Improvement

84% reduction in WER compared to the original model (from 94.59% to 15.08%)

Efficient Inference Format

Converted to optimized .bin format, compatible with Faster-Whisper for faster processing

Multi-Source Data Training

Incorporates ATCO2 and UWB-ATCC corpora, covering diverse ATC communication scenarios

Model Capabilities

Aviation Communication Speech Recognition

Professional Terminology Transcription

Accent Adaptation Processing

Real-time Speech-to-Text

Use Cases

Aviation Operations

Real-time ATC Communication Transcription

Real-time transcription of dialogues between pilots and air traffic controllers

Improves communication record accuracy and retrievability

Aviation Safety Analysis

Used for post-event analysis of communication content and potential issues

Assists in enhancing aviation safety standards

Training & Research

ATC Personnel Training

Generates text records for training materials

Enhances training efficiency and quality

Aviation Communication Research

Supports linguistic or communication efficiency studies

Provides standardized text data

🚀 Whisper Medium EN Fine-Tuned for Air Traffic Control (ATC) - Faster-Whisper Optimized

This model is a fine - tuned version of OpenAI's Whisper Medium EN. It's trained on Air Traffic Control (ATC) communication datasets, greatly enhancing transcription accuracy for aviation - specific communications. The Word Error Rate (WER) is reduced by 84% compared to the original pretrained model. It can handle accent variations and ambiguous phrasing in ATC communications well. Also, it's converted to an optimized .bin format for Faster - Whisper, enabling faster and more efficient inference.

✨ Features

Enhanced Accuracy: Significantly reduces the Word Error Rate (WER) for ATC communications.
Domain - Specific Training: Trained on ATC communication datasets to handle aviation - specific language.
Optimized for Faster Inference: Converted to a .bin format compatible with Faster - Whisper.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Overview

This model is a fine - tuned version of OpenAI's Whisper Medium EN model, specifically trained on Air Traffic Control (ATC) communication datasets. The fine - tuning process significantly improves transcription accuracy on domain - specific aviation communications, reducing the Word Error Rate (WER) by 84%, compared to the original pretrained model. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.

This model has been converted to an optimized .bin format, making it compatible with Faster - Whisper for faster and more efficient inference.

Base Model: OpenAI Whisper Medium EN
Fine - tuned Model WER: 15.08%
Pretrained Model WER: 94.59%
Relative Improvement: 84.06%
Optimized Format: Compatible with Faster - Whisper

You can access the fine - tuned model on Hugging Face:

[Whisper Medium EN Fine - Tuned for ATC](https://huggingface.co/jacktol/whisper - medium.en - fine - tuned - for - ATC)
[Whisper Medium EN Fine - Tuned for ATC (Faster Whisper)](https://huggingface.co/jacktol/whisper - medium.en - fine - tuned - for - ATC - faster - whisper)

Model Description

Whisper Medium EN fine - tuned for ATC is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine - tuned using data from the [ATC Dataset](https://huggingface.co/datasets/jacktol/atc - dataset), a combined and cleaned dataset sourced from the following:

ATCO2 corpus (1 - hour test subset)
UWB - ATCC corpus

The ATC Dataset merges these two original sources, filtering and refining the data to enhance transcription accuracy for domain - specific ATC communications. The model has been further optimized to a .bin format for compatibility with Faster - Whisper, ensuring faster and more efficient processing.

Intended Use

The fine - tuned Whisper model is designed for:

Transcribing aviation communication: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
Air Traffic Control Systems: Assisting in real - time transcription of pilot - ATC conversations, helping improve situational awareness.
Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.

You can test the model online using the [ATC Transcription Assistant](https://huggingface.co/spaces/jacktol/ATC - Transcription - Assistant), which lets you upload audio files and generate transcriptions.

Training Procedure

Hardware: Fine - tuning was conducted on two A100 GPUs with 80GB memory.
Epochs: 10
Learning Rate: 1e - 5
Batch Size: 32 (effective batch size with gradient accumulation)
Augmentation: Dynamic data augmentation techniques (Gaussian noise, pitch shifting, etc.) were applied during training.
Evaluation Metric: Word Error Rate (WER)

Limitations

While the fine - tuned model performs well in ATC - specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech - to - text models, transcription accuracy can be affected by extremely poor - quality audio or heavily accented speech not encountered during training.

References

Blog Post: [Fine - Tuning Whisper for ATC: 84% Improvement in Transcription Accuracy](https://jacktol.net/posts/fine - tuning_whisper_for_atc/)
GitHub Repository: [Fine - Tuning Whisper on ATC Data](https://github.com/jack - tol/fine - tuning - whisper - on - atc - data/tree/main)

🔧 Technical Details

The model is a fine - tuned version of OpenAI's Whisper Medium EN. The fine - tuning is done on ATC communication datasets. The training is carried out on two A100 GPUs with 80GB memory for 10 epochs, using a learning rate of 1e - 5 and a batch size of 32. Dynamic data augmentation techniques are applied during training, and the evaluation metric is Word Error Rate (WER).

📄 License

The model is licensed under the MIT license.

Information Table

Property	Details
Model Type	Whisper Medium EN Fine - Tuned for Air Traffic Control (ATC) - Faster - Whisper Optimized
Training Data	jacktol/atc - dataset (combined from ATCO2 corpus and UWB - ATCC corpus)
Base Model	openai/whisper - medium.en
Pipeline Tag	automatic - speech - recognition
Metrics	Word Error Rate (WER)
Results	On ATC Dataset, WER = 15.08%
Source	[ATC Transcription Evaluation](https://huggingface.co/jacktol/whisper - medium.en - fine - tuned - for - ATC - faster - whisper)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご