wav2vec2-large-lv60h-100h-2nd-try Open-source Speech Recognition Model - Supports Accurate English Speech-to-Text Conversion

Wav2vec2 Large Lv60h 100h 2nd Try

Developed by patrickvonplaten

A wav2vec2-large-lv60 speech recognition model fine-tuned on the LibriSpeech dataset, supporting English speech-to-text tasks

Speech Recognition

Transformers

#Speech Recognition Optimization #Low-resource Fine-tuning #Dynamic Batch Padding

Downloads 20

Release Time : 3/2/2022

Model Overview

This model is part of the wav2vec2 series released by Facebook Research. It was pre-trained through self-supervised learning and fine-tuned on 100 hours of LibriSpeech-clean data for English speech recognition tasks.

Model Features

Efficient Fine-tuning

Achieves performance close to full-data fine-tuning with only 100 hours of labeled data

Dynamic Batch Padding

Automatically optimizes batch padding strategy during training to improve GPU utilization

Mixed Precision Training

Supports fp16 mixed precision training to reduce memory usage and accelerate training

Model Capabilities

English speech recognition

High-accuracy speech-to-text conversion

Long audio processing (supports batches up to 750 seconds)

Use Cases

Speech Transcription

Automatic Meeting Minutes Generation

Automatically converts English meeting recordings into text transcripts

Achieves WER of 4.0 (clean)/10.3 (other) on the LibriSpeech test set

Podcast Content Indexing

Creates searchable text indexes for English podcast episodes

Property	Details
"clean"	4.0
"other"	10.3

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Wav2vec2 Large Lv60h 100h 2nd Try

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Fine-tuning of `wav2vec2-large-lv60`

🚀 Quick Start

📊 Results

Wav2vec2 Large Lv60h 100h 2nd Try

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Fine-tuning of wav2vec2-large-lv60

🚀 Quick Start

📊 Results

🚀 Fine-tuning of `wav2vec2-large-lv60`