The open-source model stt_ca-es_conformer_transducer_large - Supports bilingual speech recognition for Catalan and Spanish.

Stt Ca Es Conformer Transducer Large

Developed by projecte-aina

A Catalan-Spanish bilingual automatic speech recognition model based on the NVIDIA Spanish model

Speech Recognition Supports Multiple Languages#Catalan-Spanish bilingual #Conformer-Transducer architecture #7000 hours of speech training

Downloads 1,127

Release Time : 11/20/2024

Model Overview

This model is a bilingual automatic speech recognition (ASR) solution suitable for Catalan and Spanish. It is built on the Conformer-Transducer architecture of NVIDIA and can transcribe speech into pure text without punctuation.

Model Features

Bilingual support

Capable of handling speech recognition tasks in both Catalan and Spanish simultaneously

Large-scale training

Fine-tuned on a bilingual dataset totaling 7426 hours

High-performance architecture

Adopts the large variant architecture of Conformer-Transducer, with powerful speech recognition capabilities

Model Capabilities

Catalan speech recognition

Spanish speech recognition

Speech-to-text

Use Cases

Speech transcription

Meeting minutes

Transcribe Catalan or Spanish meeting recordings into text

Generate a punctuation-free pure text transcription result

Media content processing

Process speech in media content such as broadcasts and podcasts

Generate a written record for the media content

🚀 NVIDIA Conformer-Transducer Large (ca-es)

This acoustic model, based on "NVIDIA/stt_es_conformer_transducer_large", is suitable for Bilingual Catalan-Spanish Automatic Speech Recognition.

🚀 Quick Start

The "stt_ca-es_conformer_transducer_large" is an acoustic model suitable for Bilingual Catalan-Spanish Automatic Speech Recognition. To use this model, you first need to install NVIDIA NeMo. It's recommended to install it after installing the latest PyTorch version.

pip install nemo_toolkit['all']

To transcribe audio in Catalan or in Spanish using this model, you can follow this example:

import nemo.collections.asr as nemo_asr

nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)

✨ Features

Bilingual Support: This model can transcribe speech in both Catalan and Spanish.
Large Variant: It is a "large" variant of Conformer - Transducer, with around 120 million parameters.
Fine - Tuned: Fine - tuned on a Bilingual ca - es dataset comprising of 7426 hours.

📦 Installation

To use this model, install NVIDIA NeMo. We recommend you install it after you've installed the latest PyTorch version.

pip install nemo_toolkit['all']

💻 Usage Examples

Basic Usage

import nemo.collections.asr as nemo_asr

nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)

📚 Documentation

Model Description

This model transcribes speech in lowercase Catalan and Spanish alphabet including spaces, and was fine - tuned on a Bilingual ca - es dataset comprising of 7426 hours. It is a "large" variant of Conformer - Transducer, with around 120 million parameters. See the model architecture section and NeMo documentation for complete architecture details.

Intended Uses and Limitations

This model can be used for Automatic Speech Recognition (ASR) in Catalan and Spanish. It is intended to transcribe audio files in Catalan and Spanish to plain text without punctuation.

Training Details

Training data

The model was trained on bilingual datasets in Catalan and Spanish, for a total of 7k hours. Including:

Parlament - Parla - v3
Corts Valencianes
3cat
IB3 (The datasets will be made accessible shortly.)
ciempiess light
ciempiess fem
ciempiess complementary
ciempiess balance
CHM150
Tedx spanish
librivox spanish
Wikipedia spanish
voxforge spanish
Tele con ciencia
Argentinian Spanish Speech Dataset
Dimex100 light
Glissando Spanish
Herico
Latino40
Common voice 17 es

Training procedure

This model is the result of finetuning the base model "Nvidia/stt_es_conformer_transducer_large" by following this tutorial.

Additional Information

Author

The fine - tuning process was performed during 2024 in the Language Technologies Unit of the Barcelona Supercomputing Center by Abir Messaoudi.

For the Catalan Valencian data we had the collaboration of CENID within the framework of the ILENIA project.

Contact

For further information, please send an email to langtech@bsc.es.

Copyright

License

CC - BY - 4.0

Funding

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project ILENIA with reference 2022/TL22/00215337.

The training of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.

📄 License

CC - BY - 4.0

📄 Citation

If this model contributes to your research, please cite the work:

@misc{conformer-transducer-BSC-2024,
      title={Bilingual ca-es ASR Model: stt_ca-es_conformer_transducer_large.}, 
      author={Messaoudi, Abir; Külebi, Baybars},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/stt_ca-es_conformer_transducer_large},
      year={2024}
}

📋 Information Table

Property	Details
Model Type	Conformer - Transducer Large
Training Data	Bilingual datasets in Catalan and Spanish, totaling 7k hours, including Parlament - Parla - v3, Corts Valencianes, etc.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご