W

Wav2vec2 Conformer Rel Pos Large 960h Ft

Developed by facebook
A Wav2Vec2-Conformer model based on 16kHz sampled speech audio, using relative positional embedding technology, pre-trained and fine-tuned on 960 hours of Librispeech data
Downloads 1,038
Release Time : 4/18/2022

Model Overview

This is a Conformer architecture model for automatic speech recognition (ASR), supporting English speech transcription with high accuracy and low word error rate (WER)

Model Features

Relative positional embedding
Uses relative positional embedding technology to enhance the model's ability to model positional relationships in speech sequences
High accuracy
Achieves word error rates (WER) of 1.85 (clean) and 3.83 (other) on the LibriSpeech test set
Large-scale training
Pre-trained and fine-tuned on 960 hours of LibriSpeech speech data

Model Capabilities

English speech recognition
16kHz audio processing
Long-sequence speech transcription

Use Cases

Speech transcription
Meeting minutes
Automatically transcribe meeting recordings into text
Highly accurate transcriptions
Voice note conversion
Convert voice notes into editable text
Assistive technology
Real-time caption generation
Generate real-time captions for videos or live streams
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase