Sew-Tiny-100k Open-source Speech Model - Suitable for Various Downstream Speech Tasks, Efficient and Practical

Sew Tiny 100k

Developed by asapp

SEW-tiny is a compressed and efficient speech pretraining model developed by ASAPP Research, pretrained on 16kHz sampled speech audio, suitable for various downstream speech tasks.

Speech Recognition

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Efficient Speech Recognition #Lightweight Pretraining #Low-resource Fine-tuning

Downloads 1,080

Release Time : 3/2/2022

Model Overview

SEW-tiny is an efficient speech pretraining model designed for tasks like automatic speech recognition, improving inference speed while maintaining performance through optimized architecture.

Model Features

Efficient Inference

Achieves 1.9x faster inference compared to wav2vec 2.0

Performance Optimization

Reduces word error rate by 25-50% under similar inference time conditions

Compressed Architecture

Specially designed compressed and efficient architecture, optimizing performance-efficiency trade-off

Model Capabilities

Speech Recognition

Speaker Recognition

Intent Classification

Emotion Recognition

Use Cases

Speech Processing

Automatic Speech Transcription

Convert speech content into text

13.5% reduction in word error rate on the LibriSpeech dataset

Voice Assistants

Speech recognition module for intelligent voice assistants

🚀 SEW-tiny

A pre - trained speech model by ASAPP Research, offering performance - efficiency trade - offs for speech recognition tasks.

✨ Features

The base model is pretrained on 16kHz sampled speech audio.
It should be fine - tuned on downstream tasks such as Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc.
The SEW architecture shows significant improvements in both performance and efficiency compared to wav2vec 2.0.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

See this blog for more information on how to fine - tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by SEWForCTC.

📚 Documentation

Model Information

Model Source: SEW by ASAPP Research
Paper: Performance - Efficiency Trade - offs in Unsupervised Pre - training for Speech Recognition
Authors: Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi

Abstract

This paper is a study of performance - efficiency trade - offs in pre - trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre - trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h - 960h semi - supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25 - 50% across different model sizes.

Original Model

The original model can be found under https://github.com/asappresearch/sew#model - checkpoints.

📄 License

The model is licensed under the apache - 2.0 license.

⚠️ Important Note

When using the model, make sure that your speech input is sampled at 16Khz.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご