O

Owsm V3.1 Ebf

Developed by espnet
OWSM is an open-source Whisper-style speech model developed based on publicly available data and the ESPnet toolkit, supporting multilingual speech recognition, translation, and other tasks.
Downloads 291
Release Time : 12/22/2023

Model Overview

OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, supporting various tasks such as speech recognition, cross-language speech translation, sentence-level alignment, long-text transcription, and language identification.

Model Features

Open-Source Speech Foundation Model
Developed entirely using publicly available data and open-source toolkits, ensuring transparency and reproducibility.
Improved Speech Encoder
Utilizes the advanced E-Branchformer encoder, significantly improving performance compared to previous versions.
Multi-Task Support
A single model supports multiple tasks such as speech recognition, translation, alignment, long-text transcription, and language identification.
Large-Scale Training Data
Trained on 180,000 hours of publicly available speech data, covering multiple languages and scenarios.

Model Capabilities

Speech Recognition
Cross-Language Speech Translation
Sentence-Level Alignment
Long-Text Transcription
Language Identification

Use Cases

Speech-to-Text
Multilingual Speech Recognition
Convert speech in multiple languages into corresponding text
Supports high-quality multilingual transcription
Speech Translation
Directly translate speech from one language into text in another language
Enables real-time cross-language translation
Speech Analysis
Language Identification
Automatically identify the language type in speech
Accurately identifies multiple languages
Speech Alignment
Align speech with text temporally
Generates precise speech-text alignment information
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase