S

Speech Text

Developed by abidlabs
An automatic speech recognition model fine-tuned on the English Common Voice dataset based on facebook/wav2vec2-large-xlsr-53, supporting English speech input at 16kHz sampling rate.
Downloads 25
Release Time : 3/7/2022

Model Overview

This is a model for English Automatic Speech Recognition (ASR), fine-tuned based on the XLSR-53 architecture, capable of converting English speech to text.

Model Features

High-Performance English Speech Recognition
Achieves a Word Error Rate (WER) of 19.06% and a Character Error Rate (CER) of 7.69% on the Common Voice English test set.
Language Model Enhancement Support
When combined with a language model, the Word Error Rate can be reduced to 14.81% and the Character Error Rate to 6.84%.
16kHz Sampling Rate Support
Optimized for speech input at 16kHz sampling rate.

Model Capabilities

English Speech Recognition
Speech-to-Text
Automatic Speech Transcription

Use Cases

Speech Transcription
Meeting Minutes Transcription
Automatically convert English meeting recordings into text transcripts
Accuracy approximately 80-85% (WER 14.81-19.06%)
Podcast Content Transcription
Automatically generate text transcripts for English podcasts
Voice Interface
Voice Assistant
Provide speech recognition capabilities for English voice assistants
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase