đ shivam/wav2vec2-xls-r-hindi
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - HI dataset. It is designed for automatic speech recognition tasks, aiming to accurately transcribe Hindi speech.
đ Quick Start
This model is a fine - tuned version of facebook/wav2vec2-xls-r-300m on the MOZILLA - FOUNDATION/COMMON_VOICE_7_0 - HI dataset.
It achieves the following results on the evaluation set:
đģ Usage Examples
Basic Usage
The basic usage of this model involves loading it and using it for speech recognition tasks. Here is a simple example code snippet (assuming you have the necessary libraries installed):
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
from datasets import load_dataset
import soundfile as sf
model = Wav2Vec2ForCTC.from_pretrained("shivam/wav2vec2-xls-r-hindi")
processor = Wav2Vec2Processor.from_pretrained("shivam/wav2vec2-xls-r-hindi")
def map_to_array(batch):
speech, _ = sf.read(batch["path"])
batch["speech"] = speech
return batch
dataset = load_dataset("mozilla-foundation/common_voice_7_0", "hi", split="test[:10]")
dataset = dataset.map(map_to_array)
input_values = processor(dataset["speech"][0], return_tensors="pt").input_values
with torch.no_grad():
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])
print(transcription)
đ Documentation
Evaluation results on Common Voice 7 "test" (Running ./eval.py)
With LM
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7.5e - 05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2000
- num_epochs: 50.0
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Wer |
5.3155 |
3.4 |
500 |
4.5582 |
1.0 |
3.3369 |
6.8 |
1000 |
3.4269 |
1.0 |
2.1785 |
10.2 |
1500 |
1.7191 |
0.8831 |
1.579 |
13.6 |
2000 |
1.3604 |
0.7647 |
1.3773 |
17.01 |
2500 |
1.2737 |
0.7519 |
1.3165 |
20.41 |
3000 |
1.2457 |
0.7401 |
1.2274 |
23.81 |
3500 |
1.3617 |
0.7301 |
1.1787 |
27.21 |
4000 |
1.2068 |
0.7010 |
1.1467 |
30.61 |
4500 |
1.2416 |
0.6946 |
1.0801 |
34.01 |
5000 |
1.2312 |
0.6990 |
1.0709 |
37.41 |
5500 |
1.2984 |
0.7138 |
1.0307 |
40.81 |
6000 |
1.2049 |
0.6871 |
1.0003 |
44.22 |
6500 |
1.1956 |
0.6841 |
1.004 |
47.62 |
7000 |
1.2101 |
0.6793 |
Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu113
- Datasets 1.18.1.dev0
- Tokenizers 0.11.0
đ License
This model is licensed under the Apache - 2.0 license.
đĻ Model Information
Property |
Details |
Model Type |
Fine - tuned version of facebook/wav2vec2-xls-r-300m for Hindi automatic speech recognition |
Training Data |
mozilla - foundation/common_voice_7_0 (Hindi subset) |
Metrics |
Wer, Cer |
Model Name |
shivam/wav2vec2-xls-r-hindi |