Wav2vec2-large-100k-voxpopuli Open-source Portuguese Speech Recognition Model - Free and Precise Recognition of Portuguese Speech

Wav2vec2 Large 100k Voxpopuli Ft Common Voice Plus TTS Dataset Portuguese

Developed by Edresson

This is an automatic speech recognition model based on Facebook's Wav2vec2 Large 100k Voxpopuli, fine-tuned on Portuguese using Common Voice 7.0 and TTS-Portuguese corpus.

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Portuguese speech recognition #Low word error rate #Multi-corpus training

Downloads 20

Release Time : 3/2/2022

Model Overview

This model is primarily used for automatic speech recognition tasks in Portuguese, capable of converting Portuguese speech into text.

Model Features

Portuguese Optimization

Specifically fine-tuned for Portuguese speech, improving recognition accuracy.

Multi-dataset Training

Trained with both Common Voice and TTS-Portuguese corpus, enhancing the model's generalization capability.

High Performance

Achieves a word error rate of 20.39% on the Common Voice 7.0 test set.

Model Capabilities

Portuguese speech recognition

Audio to text conversion

Automatic speech recognition

Use Cases

Speech transcription

Portuguese speech to text

Automatically convert Portuguese speech content into text format

Word error rate 20.39%

Voice assistants

Portuguese voice command recognition

Used for developing Portuguese voice assistants and control systems

🚀 Wav2vec2 Large 100k Voxpopuli fine-tuned with Common Voice and TTS-Portuguese Corpus in Portuguese

This model is a fine-tuned version of Wav2vec2 Large 100k Voxpopuli on Portuguese, using the Common Voice 7.0 and TTS-Portuguese Corpus, aiming to solve the problem of automatic speech recognition in Portuguese and provide high - quality speech - to - text conversion.

🚀 Quick Start

Prerequisites

This model is based on the transformers library, and you need to install it first. You can use the following command to install it:

pip install transformers

Loading the Model

from transformers import AutoTokenizer, Wav2Vec2ForCTC
  
tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")

model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")

✨ Features

Multilingual Adaptability: Based on the pre - trained model of Wav2vec2 Large 100k Voxpopuli, it has strong adaptability to multiple languages.
High - Quality Fine - Tuning: Fine - tuned with the Common Voice 7.0 and TTS - Portuguese Corpus, it can achieve good performance in Portuguese speech recognition.

📦 Installation

Before using this model, you need to install the transformers library and other necessary dependencies. You can use the following command to install them:

pip install transformers torchaudio datasets jiwer

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, Wav2Vec2ForCTC
  
tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")
model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")

Advanced Usage

Testing with the Common Voice Dataset

dataset = load_dataset("common_voice", "pt", split="test", data_dir="./cv-corpus-6.1-2020-12-11")

resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)

def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
    batch["sampling_rate"] = resampler.new_freq
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("â€™", "'")
    return batch

ds = dataset.map(map_to_array)
result = ds.map(map_to_pred, batched=True, batch_size=1, remove_columns=list(ds.features.keys()))
print(wer.compute(predictions=result["predicted"], references=result["target"]))

📚 Documentation

Results

For the results check the paper

📄 License

This model is licensed under the apache - 2.0 license.

📋 Model Information

Property	Details
Model Type	Wav2vec2 Large 100k Voxpopuli fine - tuned in Portuguese
Training Data	Common Voice 7.0 and TTS - Portuguese Corpus
Metrics	Word Error Rate (WER)
Tags	audio, speech, wav2vec2, pt, portuguese - speech - corpus, automatic - speech - recognition, speech, PyTorch
License	apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご