Open-source base_10k_8khz_pt model - Supports 8kHz and enables accurate automatic Portuguese speech recognition

Base 10k 8khz Pt

Developed by lgris

A Portuguese automatic speech recognition model fine-tuned from facebook/wav2vec2-base-10k-voxpopuli, supporting 8kHz sampling rate

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Brazilian Portuguese ASR #Multi-source dataset fine-tuning #Low sampling rate adaptation

Downloads 28

Release Time : 3/2/2022

Model Overview

This is an optimized automatic speech recognition (ASR) model for Portuguese, based on the Wav2vec 2.0 architecture, fine-tuned using multiple Portuguese speech datasets.

Model Features

Multi-dataset Fine-tuning

Fine-tuned using multiple Portuguese speech datasets including CETUC, Common Voice, and Lapsbm to improve recognition accuracy

8kHz Sampling Rate Support

Optimized to support 8kHz sampling rate audio input, suitable for more real-world application scenarios

Brazilian Portuguese Optimization

Specifically optimized for Brazilian Portuguese variants, delivering better recognition performance

Model Capabilities

Portuguese speech recognition

Audio-to-text conversion

Supports 8kHz sampling rate input

Use Cases

Speech Transcription

Automatic Meeting Transcription

Automatically convert Portuguese meeting recordings into text transcripts

Voice Note Conversion

Convert Portuguese voice notes into editable text

Accessibility Applications

Real-time Caption Generation

Generate real-time captions for Portuguese video content

🚀 Wav2vec 2.0 for Portuguese in 8kHz

This is a fine - tuned model for Portuguese speech recognition, leveraging the power of wav2vec 2.0 technology.

🚀 Quick Start

This is a fine - tuned model from [facebook/wav2vec2 - base - 10k - voxpopuli](https://huggingface.co/facebook/wav2vec2 - base - 10k - voxpopuli).

📚 Documentation

Datasets

The following datasets were used to fine - tune the model:

CETUC: It contains approximately 145 hours of Brazilian Portuguese speech. The speech is distributed among 50 male and 50 female speakers. Each speaker pronounces approximately 1,000 phonetically balanced sentences selected from the CETEN - Folha corpus.
Common Voice 7.0: A project proposed by the Mozilla Foundation. Its goal is to create a widely open dataset in different languages. In this project, volunteers donate and validate speech through the official site.
Lapsbm: "Falabrasil - UFPA" is a dataset used by the Fala Brasil group to benchmark ASR systems in Brazilian Portuguese. It contains 35 speakers (10 females), each pronouncing 20 unique sentences, resulting in a total of 700 utterances in Brazilian Portuguese. The audios were recorded at 22.05 kHz without environment control.
Multilingual Librispeech (MLS): A massive dataset available in many languages. It is based on audiobook recordings in the public domain like LibriVox. The dataset has a total of 6k hours of transcribed data in many languages. The Portuguese set used in this work (mostly the Brazilian variant) has approximately 284 hours of speech, obtained from 55 audiobooks read by 62 speakers.
Multilingual TEDx: A collection of audio recordings from TEDx talks in 8 source languages. The Portuguese set (mostly the Brazilian Portuguese variant) contains 164 hours of transcribed speech.
Sidney (SID): It contains 5,777 utterances recorded by 72 speakers (20 women) aged from 17 to 59 years old. Information such as place of birth, age, gender, education, and occupation is also available.
VoxForge: A project aiming to build open datasets for acoustic models. The corpus contains approximately 100 speakers and 4,130 utterances of Brazilian Portuguese, with sample rates ranging from 16kHz to 44.1kHz.
VoxPopuli

Metrics

📄 License

This project is licensed under the apache - 2.0 license.

Property	Details
Model Type	Fine - tuned wav2vec 2.0 model for Portuguese in 8kHz
Training Data	CETUC, Common Voice 7.0, Lapsbm, Multilingual Librispeech (MLS), Multilingual TEDx, Sidney (SID), VoxForge, VoxPopuli
Metrics	wer
Tags	audio, speech, wav2vec2, pt, portuguese - speech - corpus, automatic - speech - recognition, speech, PyTorch
License	apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご