S

Sew D Mid 400k Ft Ls100h

Developed by asapp
SEW-D-mid is a speech pre-training model developed by ASAPP Research, focusing on automatic speech recognition tasks, achieving a good balance between performance and efficiency.
Downloads 20
Release Time : 3/2/2022

Model Overview

This model is a speech pre-training model based on the SEW architecture, pre-trained on 16kHz sampled speech audio, suitable for downstream tasks such as automatic speech recognition, speaker recognition, intent classification, etc.

Model Features

Efficiency-Performance Balance
Achieves 1.9x inference speedup compared to wav2vec 2.0 while reducing word error rate by 13.5%
Multi-task Applicability
Can be fine-tuned for various speech-related downstream tasks, including ASR, speaker recognition, intent classification, etc.
Optimized Architecture Design
Adopts the SEW architecture, incorporating multiple optimization designs to improve model efficiency

Model Capabilities

Speech Recognition
Speech Feature Extraction
Audio Content Understanding

Use Cases

Speech Transcription
Meeting Minutes Transcription
Automatically transcribe meeting recordings into text records
WER of 4.94 on the LibriSpeech clean test set
Voice Command Recognition
Recognize and understand voice commands
Speech Analysis
Speaker Recognition
Identify speaker characteristics in speech
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase