Whisper-medium-portuguese Open-source Speech Recognition Model - Free Deployment for Precise Portuguese Speech Recognition

Whisper Medium Portuguese

Developed by pierreguillou

A Portuguese speech recognition model fine-tuned on the common_voice_11_0 dataset based on openai/whisper-medium, with a word error rate of 6.5987

Speech Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Portuguese speech recognition #Low word error rate #Common Voice optimization

Downloads 191

Release Time : 12/15/2022

Model Overview

An automatic speech recognition model optimized for Portuguese, capable of accurately transcribing Portuguese audio into text

Model Features

High-performance Portuguese recognition

Achieves a word error rate of only 6.5987 on the Common Voice 11.0 test set, outperforming the original Whisper model

Based on Whisper architecture

Utilizes the OpenAI Whisper-medium architecture with powerful speech recognition capabilities

Open-source license

Released under the Apache 2.0 license, allowing free use and modification

Model Capabilities

Portuguese speech recognition

Audio transcription

Automatic speech-to-text

Use Cases

Speech transcription

Portuguese meeting minutes

Automatically transcribes Portuguese meeting recordings into text records

Accuracy up to 93.4%

Podcast content transcription

Converts Portuguese podcast content into searchable text

Assistive tools

Hearing impairment assistance

Provides real-time speech-to-text services for the hearing impaired

🚀 Portuguese Medium Whisper

This model is a fine - tuned version of [openai/whisper - medium](https://huggingface.co/openai/whisper - medium) on the common_voice_11_0 dataset, which can achieve good results in Portuguese speech recognition.

🚀 Quick Start

This model is a fine - tuned version of [openai/whisper - medium](https://huggingface.co/openai/whisper - medium) on the common_voice_11_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.2628
Wer: 6.5987

📚 Documentation

Blog post

All information about this model in this blog post: [Speech - to - Text & IA | Transcreva qualquer áudio para o português com o Whisper (OpenAI)... sem nenhum custo!](https://medium.com/@pierre_guillou/speech - to - text - ia - transcreva - qualquer - %C3%A1udio - para - o - português - com - o - whisper - openai - sem - ad0c17384681).

New SOTA

The Normalized WER in the OpenAI Whisper article with the [Common Voice 9.0](https://huggingface.co/datasets/mozilla - foundation/common_voice_9_0) test dataset is 8.1.

As this test dataset is similar to the [Common Voice 11.0](https://huggingface.co/datasets/mozilla - foundation/common_voice_11_0) test dataset used to evaluate our model (WER and WER Norm), it means that our Portuguese Medium Whisper is better than the [Medium Whisper](https://huggingface.co/openai/whisper - medium) model at transcribing audios Portuguese in text (and even better than the [Whisper Large](https://huggingface.co/openai/whisper - large) that has a WER Norm of 7.1!).

![OpenAI results with Whisper Medium and Test dataset of Commons Voice 9.0](https://huggingface.co/pierreguillou/whisper - medium - portuguese/resolve/main/whisper_medium_portuguese_wer_commonvoice9.png)

🔧 Technical Details

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 9e - 06
train_batch_size: 32
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 6000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0333	2.07	1500	0.2073	6.9770
0.0061	5.05	3000	0.2628	6.5987
0.0007	8.03	4500	0.2960	6.6979
0.0004	11.0	6000	0.3212	6.6794

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.7.1.dev0
Tokenizers 0.13.2

📄 License

This model is licensed under the Apache 2.0 license.

Property	Details
Model Type	Portuguese Medium Whisper, a fine - tuned version of openai/whisper - medium
Training Data	mozilla - foundation/common_voice_11_0
Metrics	wer

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご