P

Psst Fairseq Larger Rir

Developed by birgermoell
This model is an automatic speech recognition (ASR) model based on the Wav2vec 2.0 architecture, fine-tuned using a subset of the TIMIT dataset enhanced with room impulse responses (RIR).
Downloads 30
Release Time : 4/15/2022

Model Overview

A speech recognition model optimized for phoneme recognition tasks, suitable for speech processing in noisy environments

Model Features

RIR-enhanced Training Data
Uses the TIMIT dataset enhanced with room impulse responses, improving the model's robustness in real-world environments
Wav2vec 2.0 Foundation
Fine-tuned based on the powerful Wav2vec 2.0 architecture, inheriting its excellent speech feature extraction capabilities
Phoneme-level Recognition
Focuses on phoneme-level speech recognition tasks, suitable for applications requiring detailed speech analysis

Model Capabilities

English Speech Recognition
Phoneme-level Analysis
Noisy Environment Speech Processing

Use Cases

Speech Technology Research
Phoneme Recognition Benchmark
Can serve as a benchmark model for phoneme recognition tasks in comparative studies
PER: 21.0%, FER: 9.2%
Speech Enhancement Applications
Speech Recognition in Noisy Environments
Suitable for speech recognition in environments with echoes and noise, such as conference rooms and public spaces
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase