The open-source asr-wav2vec2-dvoice-amharic model - Accurately achieve automatic speech recognition for Amharic

Asr Wav2vec2 Dvoice Amharic

Developed by speechbrain

This is an automatic speech recognition model for Amharic, trained using wav2vec 2.0 architecture with CTC/Attention mechanism

Speech Recognition

PyTorch

OtherOpen Source License:Apache-2.0 #African language recognition #Low-resource speech processing #wav2vec2 fine-tuning

Downloads 96

Release Time : 6/9/2022

Model Overview

This model is an end-to-end automatic speech recognition system specifically designed for Amharic speech transcription tasks. It combines a pre-trained wav2vec 2.0 model with a CTC decoder, fine-tuned on the DVoice Amharic dataset.

Model Features

Pre-trained model fine-tuning

Fine-tuned based on facebook/wav2vec2-large-xlsr-53 pre-trained model, improving recognition capability for Amharic

End-to-end system

Provides a complete end-to-end solution including tokenizer and acoustic model

Multi-platform support

Supports both CPU and GPU inference, flexible deployment across different hardware environments

Model Capabilities

Amharic speech recognition

Audio transcription

Speech-to-text

Use Cases

Speech transcription

Amharic speech transcription

Convert Amharic speech to text

Validation set CER 6.71%, WER 25.50%

Voice assistant

Amharic voice assistant

Build voice interaction systems supporting Amharic

🚀 wav2vec 2.0 with CTC/Attention trained on DVoice Amharic (No LM)

This repository offers all essential tools for automatic speech recognition using an end - to - end system pretrained on an Amharic dataset within SpeechBrain. For a better experience, explore SpeechBrain.

Property	Details
Model Type	Automatic Speech Recognition
Training Data	DVoice
Metrics	WER, CER
License	Apache - 2.0

DVoice Release	Val. CER	Val. WER	Test CER	Test WER
v2.0	6.71	25.50	6.57	24.92

🚀 Quick Start

This ASR system consists of two different yet linked components:

A Tokenizer (unigram) that converts words into sub - word units, trained with the train transcriptions.
An Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model (facebook/wav2vec2-large-xlsr-53) is combined with two DNN layers and fine - tuned on the Darija dataset. The final acoustic representation is fed into the CTC greedy decoder. The system is trained with 16kHz single - channel recordings. The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling transcribe_file if necessary.

📦 Installation

First, install transformers and SpeechBrain using the following command:

pip install speechbrain transformers

💡 Usage Tip

We recommend reading the SpeechBrain tutorials and learning more about SpeechBrain.

💻 Usage Examples

Basic Usage

from speechbrain.inference.ASR import EncoderASR
asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-amharic", savedir="pretrained_models/asr-wav2vec2-dvoice-amharic")
asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-amharic/example_amharic.wav')

Advanced Usage

To perform inference on the GPU, add run_opts={"device":"cuda"} when calling the from_hparams method.

📚 Documentation

Training

The model was trained with SpeechBrain. To train it from scratch:

Clone SpeechBrain:

git clone https://github.com/speechbrain/speechbrain/

Install it:

cd speechbrain
pip install -r requirements.txt
pip install -e .

Run Training:

cd recipes/DVoice/ASR/CTC
python train_with_wav2vec2.py hparams/train_amh_with_wav2vec.yaml --data_folder=/localscratch/ALFFA_PUBLIC/ASR/AMHARIC/data/

You can find our training results (models, logs, etc) here.

Limitations

⚠️ Important Note

The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.

About DVoice

DVoice is a community initiative aiming to provide African low - resource languages with data and models for voice technologies. It uses two approaches: the DVoice platforms (https://dvoice.ma and https://dvoice.sn) based on Mozilla Common Voice for collecting authentic recordings, and transfer learning techniques for labeling social - media recordings. The DVoice platform currently manages 7 languages including Darija.

For this project, AIOX Labs and the SI2M Laboratory are collaborating to build future technologies.

About AIOX Labs

Based in Rabat, London, and Paris, AIOX - Labs uses artificial intelligence technologies to meet business needs and data projects of companies.

It serves group growth, process optimization, and customer - experience improvement.
AIOX - Labs operates in multiple sectors, from fintech to industry, including retail and consumer goods.
It offers business - ready data products with a solid algorithmic base and adaptability for each client's specific needs.
The team comprises AI doctors and business experts with a strong scientific background and international publications.

Website: [https://www.aiox - labs.com/](https://www.aiox - labs.com/)

SI2M Laboratory

The Information Systems, Intelligent Systems, and Mathematical Modeling Research Laboratory (SI2M) is an academic research laboratory of the National Institute of Statistics and Applied Economics (INSEA). Its research areas include Information Systems, Intelligent Systems, Artificial Intelligence, Decision Support, Network, and System Security, and Mathematical Modelling.

Website: [SI2M Laboratory](https://insea.ac.ma/index.php/pole - recherche/equipe - de - recherche/150 - laboratoire - de - recherche - en - systemes - d - information - systemes - intelligents - et - modelisation - mathematique)

About SpeechBrain

SpeechBrain is an open - source and all - in - one speech toolkit. It is simple, extremely flexible, and user - friendly, achieving competitive or state - of - the - art performance in various domains. Website: https://speechbrain.github.io/ GitHub: https://github.com/speechbrain/speechbrain

Referencing SpeechBrain

@misc{SB2021,
    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju - Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien - Feng and Cornell, Samuele and Yeh, Sung - Lin and Na, Hwidong and Gao, Yan and Fu, Szu - Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
    title = {SpeechBrain},
    year = {2021},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
  }

Acknowledgements

This research was supported through computational resources of HPC - MARWAN (www.marwan.ma/hpc) provided by CNRST, Rabat, Morocco. We deeply thank this institution.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご