🚀 NVIDIA Conformer-Transducer Large (ca-es)
This acoustic model, based on "NVIDIA/stt_es_conformer_transducer_large", is suitable for Bilingual Catalan-Spanish Automatic Speech Recognition.
🚀 Quick Start
The "stt_ca-es_conformer_transducer_large" is an acoustic model suitable for Bilingual Catalan-Spanish Automatic Speech Recognition. To use this model, you first need to install NVIDIA NeMo. It's recommended to install it after installing the latest PyTorch version.
pip install nemo_toolkit['all']
To transcribe audio in Catalan or in Spanish using this model, you can follow this example:
import nemo.collections.asr as nemo_asr
nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)
✨ Features
- Bilingual Support: This model can transcribe speech in both Catalan and Spanish.
- Large Variant: It is a "large" variant of Conformer - Transducer, with around 120 million parameters.
- Fine - Tuned: Fine - tuned on a Bilingual ca - es dataset comprising of 7426 hours.
📦 Installation
To use this model, install NVIDIA NeMo. We recommend you install it after you've installed the latest PyTorch version.
pip install nemo_toolkit['all']
💻 Usage Examples
Basic Usage
import nemo.collections.asr as nemo_asr
nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)
📚 Documentation
Model Description
This model transcribes speech in lowercase Catalan and Spanish alphabet including spaces, and was fine - tuned on a Bilingual ca - es dataset comprising of 7426 hours. It is a "large" variant of Conformer - Transducer, with around 120 million parameters.
See the model architecture section and NeMo documentation for complete architecture details.
Intended Uses and Limitations
This model can be used for Automatic Speech Recognition (ASR) in Catalan and Spanish. It is intended to transcribe audio files in Catalan and Spanish to plain text without punctuation.
Training Details
Training data
The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:
Training procedure
This model is the result of finetuning the base model "Nvidia/stt_es_conformer_transducer_large" by following this tutorial.
Additional Information
Author
The fine - tuning process was performed during 2024 in the Language Technologies Unit of the Barcelona Supercomputing Center by Abir Messaoudi.
For the Catalan Valencian data we had the collaboration of CENID within the framework of the ILENIA project.
Contact
For further information, please send an email to langtech@bsc.es.
Copyright
Copyright(c) 2024 by Language Technologies Unit, Barcelona Supercomputing Center.
License
CC - BY - 4.0
Funding
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.
The training of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.
📄 License
CC - BY - 4.0
📄 Citation
If this model contributes to your research, please cite the work:
@misc{conformer-transducer-BSC-2024,
title={Bilingual ca-es ASR Model: stt_ca-es_conformer_transducer_large.},
author={Messaoudi, Abir; Külebi, Baybars},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/projecte-aina/stt_ca-es_conformer_transducer_large},
year={2024}
}
📋 Information Table
Property |
Details |
Model Type |
Conformer - Transducer Large |
Training Data |
Bilingual datasets in Catalan and Spanish, totaling 7k hours, including Parlament - Parla - v3, Corts Valencianes, etc. |