Psst - fairseq - larger - rir Open - Source Automatic Speech Recognition Model: Fine

Psst Fairseq Larger Rir

Developed by birgermoell

This model is an automatic speech recognition (ASR) model based on the Wav2vec 2.0 architecture, fine-tuned using a subset of the TIMIT dataset enhanced with room impulse responses (RIR).

Speech Recognition

Transformers

EnglishOpen Source License:Apache-2.0 #Room Impulse Response Enhancement #Phoneme-level Recognition #Low Frame Error Rate

Downloads 30

Release Time : 4/15/2022

Model Overview

A speech recognition model optimized for phoneme recognition tasks, suitable for speech processing in noisy environments

Model Features

RIR-enhanced Training Data

Uses the TIMIT dataset enhanced with room impulse responses, improving the model's robustness in real-world environments

Wav2vec 2.0 Foundation

Fine-tuned based on the powerful Wav2vec 2.0 architecture, inheriting its excellent speech feature extraction capabilities

Phoneme-level Recognition

Focuses on phoneme-level speech recognition tasks, suitable for applications requiring detailed speech analysis

Model Capabilities

English Speech Recognition

Phoneme-level Analysis

Noisy Environment Speech Processing

Use Cases

Speech Technology Research

Phoneme Recognition Benchmark

Can serve as a benchmark model for phoneme recognition tasks in comparative studies

PER: 21.0%, FER: 9.2%

Speech Enhancement Applications

Speech Recognition in Noisy Environments

Suitable for speech recognition in environments with echoes and noise, such as conference rooms and public spaces

Property	Details
Model Type	Automatic Speech Recognition
Training Data	PSST Challenge data, a subset of TIMIT augmented with Room Impulse Response (RIR)
Fine - tuning Base	Wav2vec 2.0 Large, No finetuning
Validation PER	21.0%
Validation FER	9.2%

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Psst Fairseq Larger Rir

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Automatic Speech Recognition Model

🚀 Quick Start

📄 License

📋 Information Table