O

Owsm Ctc V3.2 Ft 1B

Developed by espnet
OWSM-CTC is an encoder-only speech foundation model based on hierarchical multitask self-conditioned CTC, supporting multilingual speech recognition, speech translation, and language identification.
Downloads 110
Release Time : 9/24/2024

Model Overview

Trained on 180k hours of public audio data, this model supports multilingual speech recognition, any-to-any speech translation, and language identification. It is part of the Open Whisper-style Speech Model (OWSM) project.

Model Features

Multitask Support
Simultaneously supports speech recognition, speech translation, and language identification tasks
Large-scale Training
Trained on 180k hours of public audio data
Efficient Inference
Provides batch inference and long audio processing capabilities
CTC Forced Alignment
Supports audio-text alignment using ctc-segmentation

Model Capabilities

Multilingual speech recognition
Any-to-any speech translation
Language identification
Long audio processing
Batch inference

Use Cases

Speech Transcription
Automatic Meeting Minutes Transcription
Automatically converts meeting recordings into text transcripts
Supports accurate transcription in multiple languages
Speech Translation
Real-time Speech Translation
Translates speech from one language to text in another language in real-time
Supports translation between any language pairs
Audio Analysis
Language Identification
Identifies the language used in audio
Can recognize multiple languages
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase