🚀 wav2vec 2.0 with CTC/Attention trained on DVoice Amharic (No LM)
This repository offers all essential tools for automatic speech recognition using an end - to - end system pretrained on an Amharic dataset within SpeechBrain. For a better experience, explore SpeechBrain.
Property |
Details |
Model Type |
Automatic Speech Recognition |
Training Data |
DVoice |
Metrics |
WER, CER |
License |
Apache - 2.0 |
DVoice Release |
Val. CER |
Val. WER |
Test CER |
Test WER |
v2.0 |
6.71 |
25.50 |
6.57 |
24.92 |
🚀 Quick Start
This ASR system consists of two different yet linked components:
- A Tokenizer (unigram) that converts words into sub - word units, trained with the train transcriptions.
- An Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model (facebook/wav2vec2-large-xlsr-53) is combined with two DNN layers and fine - tuned on the Darija dataset. The final acoustic representation is fed into the CTC greedy decoder. The system is trained with 16kHz single - channel recordings. The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling transcribe_file if necessary.
📦 Installation
First, install transformers and SpeechBrain using the following command:
pip install speechbrain transformers
💡 Usage Tip
We recommend reading the SpeechBrain tutorials and learning more about SpeechBrain.
💻 Usage Examples
Basic Usage
from speechbrain.inference.ASR import EncoderASR
asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-amharic", savedir="pretrained_models/asr-wav2vec2-dvoice-amharic")
asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-amharic/example_amharic.wav')
Advanced Usage
To perform inference on the GPU, add run_opts={"device":"cuda"}
when calling the from_hparams
method.
📚 Documentation
Training
The model was trained with SpeechBrain. To train it from scratch:
- Clone SpeechBrain:
git clone https://github.com/speechbrain/speechbrain/
- Install it:
cd speechbrain
pip install -r requirements.txt
pip install -e .
- Run Training:
cd recipes/DVoice/ASR/CTC
python train_with_wav2vec2.py hparams/train_amh_with_wav2vec.yaml --data_folder=/localscratch/ALFFA_PUBLIC/ASR/AMHARIC/data/
You can find our training results (models, logs, etc) here.
Limitations
⚠️ Important Note
The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
About DVoice
DVoice is a community initiative aiming to provide African low - resource languages with data and models for voice technologies. It uses two approaches: the DVoice platforms (https://dvoice.ma and https://dvoice.sn) based on Mozilla Common Voice for collecting authentic recordings, and transfer learning techniques for labeling social - media recordings. The DVoice platform currently manages 7 languages including Darija.
For this project, AIOX Labs and the SI2M Laboratory are collaborating to build future technologies.
About AIOX Labs
Based in Rabat, London, and Paris, AIOX - Labs uses artificial intelligence technologies to meet business needs and data projects of companies.
- It serves group growth, process optimization, and customer - experience improvement.
- AIOX - Labs operates in multiple sectors, from fintech to industry, including retail and consumer goods.
- It offers business - ready data products with a solid algorithmic base and adaptability for each client's specific needs.
- The team comprises AI doctors and business experts with a strong scientific background and international publications.
Website: [https://www.aiox - labs.com/](https://www.aiox - labs.com/)
SI2M Laboratory
The Information Systems, Intelligent Systems, and Mathematical Modeling Research Laboratory (SI2M) is an academic research laboratory of the National Institute of Statistics and Applied Economics (INSEA). Its research areas include Information Systems, Intelligent Systems, Artificial Intelligence, Decision Support, Network, and System Security, and Mathematical Modelling.
Website: [SI2M Laboratory](https://insea.ac.ma/index.php/pole - recherche/equipe - de - recherche/150 - laboratoire - de - recherche - en - systemes - d - information - systemes - intelligents - et - modelisation - mathematique)
About SpeechBrain
SpeechBrain is an open - source and all - in - one speech toolkit. It is simple, extremely flexible, and user - friendly, achieving competitive or state - of - the - art performance in various domains.
Website: https://speechbrain.github.io/
GitHub: https://github.com/speechbrain/speechbrain
Referencing SpeechBrain
@misc{SB2021,
author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju - Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien - Feng and Cornell, Samuele and Yeh, Sung - Lin and Na, Hwidong and Gao, Yan and Fu, Szu - Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
title = {SpeechBrain},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
}
Acknowledgements
This research was supported through computational resources of HPC - MARWAN (www.marwan.ma/hpc) provided by CNRST, Rabat, Morocco. We deeply thank this institution.