๐ Wav2vec2 Large 100k Voxpopuli fine-tuned with Common Voice and TTS-Portuguese Corpus in Portuguese
This model is a fine-tuned version of Wav2vec2 Large 100k Voxpopuli on Portuguese, using the Common Voice 7.0 and TTS-Portuguese Corpus, aiming to solve the problem of automatic speech recognition in Portuguese and provide high - quality speech - to - text conversion.
๐ Quick Start
Prerequisites
This model is based on the transformers
library, and you need to install it first. You can use the following command to install it:
pip install transformers
Loading the Model
from transformers import AutoTokenizer, Wav2Vec2ForCTC
tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")
model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")
โจ Features
- Multilingual Adaptability: Based on the pre - trained model of Wav2vec2 Large 100k Voxpopuli, it has strong adaptability to multiple languages.
- High - Quality Fine - Tuning: Fine - tuned with the Common Voice 7.0 and TTS - Portuguese Corpus, it can achieve good performance in Portuguese speech recognition.
๐ฆ Installation
Before using this model, you need to install the transformers
library and other necessary dependencies. You can use the following command to install them:
pip install transformers torchaudio datasets jiwer
๐ป Usage Examples
Basic Usage
from transformers import AutoTokenizer, Wav2Vec2ForCTC
tokenizer = AutoTokenizer.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")
model = Wav2Vec2ForCTC.from_pretrained("Edresson/wav2vec2-large-100k-voxpopuli-ft-Common-Voice_plus_TTS-Dataset-portuguese")
Advanced Usage
Testing with the Common Voice Dataset
dataset = load_dataset("common_voice", "pt", split="test", data_dir="./cv-corpus-6.1-2020-12-11")
resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)
def map_to_array(batch):
speech, _ = torchaudio.load(batch["path"])
batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
batch["sampling_rate"] = resampler.new_freq
batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower().replace("รขโฌโข", "'")
return batch
ds = dataset.map(map_to_array)
result = ds.map(map_to_pred, batched=True, batch_size=1, remove_columns=list(ds.features.keys()))
print(wer.compute(predictions=result["predicted"], references=result["target"]))
๐ Documentation
Results
For the results check the paper
๐ License
This model is licensed under the apache - 2.0
license.
๐ Model Information
Property |
Details |
Model Type |
Wav2vec2 Large 100k Voxpopuli fine - tuned in Portuguese |
Training Data |
Common Voice 7.0 and TTS - Portuguese Corpus |
Metrics |
Word Error Rate (WER) |
Tags |
audio, speech, wav2vec2, pt, portuguese - speech - corpus, automatic - speech - recognition, speech, PyTorch |
License |
apache - 2.0 |