wav2vec2-2-roberta-large Model - Open-source and Free Speech-to-Text, Trained on LibriSpeech Dataset

Home

Wav2vec2 2 Roberta Large No Adapter Frozen Enc

Developed by speech-seq2seq

This model is a speech recognition model trained on the LibriSpeech ASR dataset, capable of converting speech to text.

Speech Recognition

Transformers

#Low Word Error Rate #English Speech Recognition #LibriSpeech Optimized

Downloads 27

Release Time : 3/2/2022

Model Overview

This is an Automatic Speech Recognition (ASR) model specifically designed for English speech-to-text tasks. The model is trained on the LibriSpeech dataset and is suitable for clear English speech recognition scenarios.

Model Features

High Accuracy

Achieved a Word Error Rate (WER) of 1.0008 on the LibriSpeech evaluation set

Optimized Training

Trained using the Adam optimizer and linear learning rate scheduler

Mixed Precision Training

Utilized native AMP for mixed precision training to improve training efficiency

Model Capabilities

English Speech Recognition

Speech-to-Text

Use Cases

Speech Transcription

Audiobook Transcription

Convert English audiobooks into text format

Meeting Minutes

Convert English meeting recordings into written transcripts

Training Loss	Epoch	Step	Validation Loss	Wer
6.4796	0.28	500	10.7690	1.0
6.2294	0.56	1000	10.5096	1.0
5.7859	0.84	1500	13.7547	1.0017
6.0219	1.12	2000	15.4966	1.0007
5.9142	1.4	2500	18.5919	1.0
5.6761	1.68	3000	16.9601	1.0
5.73	1.96	3500	18.9857	1.0004
4.9793	2.24	4000	18.3202	1.0007
5.2332	2.52	4500	19.5416	1.0008
4.9792	2.8	5000	20.5959	1.0008

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Wav2vec2 2 Roberta Large No Adapter Frozen Enc

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Speech Recognition Model

🚀 Quick Start

📚 Documentation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions