S

Sew D Mid K127 400k Ft Ls100h

Developed by asapp
SEW-D-mid-k127 is an efficient speech recognition pre-trained model developed by ASAPP Research, demonstrating significant improvements in performance and efficiency compared to wav2vec 2.0.
Downloads 16
Release Time : 3/2/2022

Model Overview

This model is a pre-trained model for Automatic Speech Recognition (ASR), based on the SEW (Squeezed and Efficient Wav2vec) architecture. It is pre-trained on 16kHz sampled speech audio and requires fine-tuning for specific tasks before use.

Model Features

Efficient Architecture Design
Achieves 1.9x inference speedup compared to wav2vec 2.0 while maintaining or improving recognition accuracy.
Performance Optimization
Reduces word error rate by 25-50% across different model sizes.
Multi-Task Applicability
Can be fine-tuned for downstream tasks such as automatic speech recognition, speaker recognition, intent classification, and emotion recognition.

Model Capabilities

English Speech Recognition
Speech Feature Extraction
Audio Content Transcription

Use Cases

Speech Transcription
Meeting Minutes
Automatically transcribe meeting recordings into text records.
WER 4.99 on LibriSpeech clean test set
Speech-to-Text Service
Provide speech-to-text conversion functionality for applications.
WER 10.95 on LibriSpeech other test set
Speech Analysis
Speaker Recognition
Identify and analyze speech features of different speakers.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase