đ Wav2Vec2 Vakyansh Tamil Model
This is a fine - tuned speech recognition model for Tamil, leveraging a multilingual pretrained base to achieve high - quality speech - to - text conversion.
đ Quick Start
This model is fine - tuned on a Multilingual Pretrained Model CLSRIL - 23. The original fairseq checkpoint can be found [here](https://github.com/Open - Speech - EkStep/vakyansh - models). When using this model, ensure that your speech input is sampled at 16kHz.
â ī¸ Important Note
The result from this model is without a language model, so you may encounter a higher Word Error Rate (WER) in some cases.
⨠Features
- Multilingual Base: Built upon a multilingual pretrained model, enhancing its generalization ability.
- Tamil - Specific: Fine - tuned for the Tamil language, providing better performance in Tamil speech recognition.
đĻ Installation
No specific installation steps are provided in the original README. If you want to use this model, you can follow the usage examples below.
đģ Usage Examples
Basic Usage
The model can be used directly (without a language model) as follows:
import soundfile as sf
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import argparse
def parse_transcription(wav_file):
processor = Wav2Vec2Processor.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")
model = Wav2Vec2ForCTC.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")
audio_input, sample_rate = sf.read(wav_file)
input_values = processor(audio_input, sampling_rate = sample_rate, return_tensors = "pt").input_values
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0], skip_special_tokens = True)
print(transcription)
đ Documentation
Dataset
This model was trained on 4200 hours of Hindi Labelled Data. As of now, the labelled data is not publicly available.
Training Script
Models were trained using an experimental platform set up by the Vakyansh team at Ekstep. You can find the [training repository](https://github.com/Open - Speech - EkStep/vakyansh - wav2vec2 - experimentation).
If you want to explore the training logs on wandb, they are [here](https://wandb.ai/harveenchadha/tamil - finetuning - multilingual).
Colab Demo
You can try the Colab Demo.
Evaluation
The model can be evaluated on the Hindi test data of Common Voice as follows:
import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re
test_dataset = load_dataset("common_voice", "ta", split = "test")
wer = load_metric("wer")
processor = Wav2Vec2Processor.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")
model = Wav2Vec2ForCTC.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")
model.to("cuda")
resampler = torchaudio.transforms.Resample(48_000, 16_000)
chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\â]'
def speech_file_to_array_fn(batch):
batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
speech_array, sampling_rate = torchaudio.load(batch["path"])
batch["speech"] = resampler(speech_array).squeeze().numpy()
return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
def evaluate(batch):
inputs = processor(batch["speech"], sampling_rate = 16_000, return_tensors = "pt", padding = True)
with torch.no_grad():
logits = model(inputs.input_values.to("cuda")).logits
pred_ids = torch.argmax(logits, dim=-1)
batch["pred_strings"] = processor.batch_decode(pred_ids, skip_special_tokens = True)
return batch
result = test_dataset.map(evaluate, batched = True, batch_size = 8)
print("WER: {:2f}".format(100 * wer.compute(predictions = result["pred_strings"], references = result["sentence"])))
Test Result: 53.64 %
You can also check the Colab Evaluation.
đ§ Technical Details
The model is fine - tuned on a Multilingual Pretrained Model CLSRIL - 23. It uses a Wav2Vec2 architecture for speech recognition.
đ License
This model is licensed under the MIT license.
đĻ Model Information
Property |
Details |
Model Type |
Wav2Vec2 Vakyansh Tamil Model |
Training Data |
4200 hours of Hindi Labelled Data (not publicly available) |
Metrics |
Word Error Rate (WER) |
Tags |
audio, automatic - speech - recognition, speech |
Model Name |
Wav2Vec2 Vakyansh Tamil Model by Harveen Chadha |
Results Dataset |
Common Voice ta |
Test WER |
53.64 |
đ Credits
Thanks to the Ekstep Foundation for making this possible. The Vakyansh team aims to open - source speech models in all Indic Languages.