Psst-fairseq-rir Open-source Automatic Speech Recognition Model - Free Deployment to Assist in Precise Recognition of Speech Content

Psst Fairseq Rir

Developed by birgermoell

This model is an automatic speech recognition (ASR) model fine-tuned on the Wav2vec 2.0 architecture, trained using a TIMIT subset enhanced with Room Impulse Response (RIR)

Speech Recognition

Transformers

EnglishOpen Source License:Apache-2.0 #Room Impulse Response Enhancement #Phoneme Error Rate 21.8%#TIMIT Subset Fine-tuning

Downloads 30

Release Time : 4/15/2022

Model Overview

A speech recognition model for English phoneme recognition that performs well in noise-enhanced environments

Model Features

Noise Robustness

Trained with RIR-enhanced data, demonstrating strong robustness for speech recognition in noisy environments

Phoneme-Level Recognition

Focuses on phoneme-level speech recognition tasks rather than word or sentence recognition

Based on Wav2vec 2.0

Leverages Wav2vec 2.0's self-supervised learning capability, performing well with small-scale labeled data

Model Capabilities

English phoneme recognition

Noisy environment speech processing

Use Cases

Speech Technology Research

Phoneme Recognition Benchmarking

Can serve as a benchmark model for phoneme recognition tasks

PER: 21.8%, FER: 9.6%

Educational Technology

Pronunciation Assessment

Used for evaluating pronunciation accuracy in language learning

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Psst Fairseq Rir

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Automatic Speech Recognition Model

🚀 Quick Start

📄 License