W

Whisper Large V3 Distil It V0.2

Developed by bofenghuang
A 2-layer decoder distilled Whisper speech-to-text model optimized for Italian, improving efficiency while maintaining accuracy
Downloads 129
Release Time : 8/22/2024

Model Overview

An Italian-optimized version based on OpenAI Whisper-Large-V3, employing 2-layer decoder distillation technology to significantly enhance inference speed while preserving speech recognition accuracy. Supports multiple inference frameworks, suitable for real-time speech-to-text applications.

Model Features

Efficient Distilled Architecture
Retains only 2 decoder layers, reducing parameters by 51% and improving inference speed by 5.8x
Long-text Optimization
Trained with extended 30-second audio segments to maintain long-text transcription capability
Multi-framework Compatibility
Supports various inference frameworks including transformers, openai-whisper, and faster-whisper
Speculative Decoding Support
Can serve as a draft model paired with full Whisper for 2x acceleration with consistent output

Model Capabilities

Italian speech recognition
Long audio transcription
Real-time speech-to-text
Multi-framework deployment

Use Cases

Speech Transcription
Automated Meeting Minutes
Automatically convert Italian meeting recordings into text transcripts
Word Error Rate (WER) outperforms similar distilled models
Media Subtitle Generation
Generate accurate subtitles for Italian video content
Supports processing audio segments up to 30 seconds long
Real-time Applications
Real-time Speech Translation Frontend
Integrated as speech recognition module in real-time translation systems
5.8x speed improvement ideal for real-time scenarios
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase