O

Owsm Ctc V3.1 1B

Developed by espnet
OWSM-CTC is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC, supporting multilingual speech recognition, speech translation, and language identification.
Downloads 116
Release Time : 2/23/2024

Model Overview

This model was trained on 180k hours of public audio data, following the design of the Open Whisper-style Speech Model (OWSM) project, supporting multilingual speech recognition, arbitrary-to-arbitrary speech translation, and language identification.

Model Features

Multi-task learning
Supports three tasks: speech recognition, speech translation, and language identification
Large-scale training
Trained on 180k hours of public audio data
Efficient inference
Provides batch inference and long audio processing capabilities
CTC forced alignment
Supports efficient timestamp alignment using ctc-segmentation

Model Capabilities

Multilingual speech recognition
Arbitrary-to-arbitrary speech translation
Language identification
Batch audio processing
Long audio segmentation processing
CTC timestamp alignment

Use Cases

Speech transcription
Meeting minutes transcription
Convert meeting recordings into text transcripts
Highly accurate transcription text
Speech translation
Real-time speech translation
Translate speech from one language to text in another language in real-time
Smooth cross-language communication
Audio analysis
Language identification
Identify the language type in audio
Accurate language classification
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase