Vakyansh-Wav2Vec2-Tamil-Tam-250 Open Source Speech Recognition Model

Vakyansh Wav2vec2 Tamil Tam 250

Developed by Harveenchadha

Tamil automatic speech recognition model based on Wav2Vec2 architecture, developed by Harveen Chadha, fine-tuned on 4200 hours of Hindi data

Speech Recognition

Transformers

OtherOpen Source License:MIT #Tamil speech recognition #Language model-free output #Multilingual pretraining fine-tuning

Downloads 1,843

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition (ASR) system specifically designed for Tamil, based on Facebook's Wav2Vec2 architecture, fine-tuned from the multilingual pretrained model CLSRIL-23

Model Features

Multilingual Pretraining Foundation

Fine-tuned from CLSRIL-23 multilingual model with cross-language transfer learning capability

Large-scale Training Data

Trained using 4200 hours of annotated speech data

Language Model Independence

Directly outputs recognition results without requiring external language models

Open Source Availability

Complete training code and model weights are open-sourced

Model Capabilities

Tamil speech recognition

16kHz audio processing

End-to-end speech-to-text

Use Cases

Speech Transcription

Tamil Speech Transcription

Convert Tamil speech content into text

Word Error Rate 53.64% (Common Voice test set)

Voice Assistants

Tamil Voice Command Recognition

Provides basic recognition capability for Tamil voice assistants

🚀 Wav2Vec2 Vakyansh Tamil Model

This is a fine - tuned speech recognition model for Tamil, leveraging a multilingual pretrained base to achieve high - quality speech - to - text conversion.

🚀 Quick Start

This model is fine - tuned on a Multilingual Pretrained Model CLSRIL - 23. The original fairseq checkpoint can be found [here](https://github.com/Open - Speech - EkStep/vakyansh - models). When using this model, ensure that your speech input is sampled at 16kHz.

⚠️ Important Note

The result from this model is without a language model, so you may encounter a higher Word Error Rate (WER) in some cases.

✨ Features

Multilingual Base: Built upon a multilingual pretrained model, enhancing its generalization ability.
Tamil - Specific: Fine - tuned for the Tamil language, providing better performance in Tamil speech recognition.

📦 Installation

No specific installation steps are provided in the original README. If you want to use this model, you can follow the usage examples below.

💻 Usage Examples

Basic Usage

The model can be used directly (without a language model) as follows:

import soundfile as sf
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import argparse

def parse_transcription(wav_file):
    # load pretrained model
    processor = Wav2Vec2Processor.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")
    model = Wav2Vec2ForCTC.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")

    # load audio
    audio_input, sample_rate = sf.read(wav_file)

    # pad input values and return pt tensor
    input_values = processor(audio_input, sampling_rate = sample_rate, return_tensors = "pt").input_values

    # INFERENCE
    # retrieve logits & take argmax
    logits = model(input_values).logits
    predicted_ids = torch.argmax(logits, dim=-1)

    # transcribe
    transcription = processor.decode(predicted_ids[0], skip_special_tokens = True)
    print(transcription)

📚 Documentation

Dataset

This model was trained on 4200 hours of Hindi Labelled Data. As of now, the labelled data is not publicly available.

Training Script

Models were trained using an experimental platform set up by the Vakyansh team at Ekstep. You can find the [training repository](https://github.com/Open - Speech - EkStep/vakyansh - wav2vec2 - experimentation). If you want to explore the training logs on wandb, they are [here](https://wandb.ai/harveenchadha/tamil - finetuning - multilingual).

Colab Demo

You can try the Colab Demo.

Evaluation

The model can be evaluated on the Hindi test data of Common Voice as follows:

import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re

test_dataset = load_dataset("common_voice", "ta", split = "test")
wer = load_metric("wer")

processor = Wav2Vec2Processor.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")
model = Wav2Vec2ForCTC.from_pretrained("Harveenchadha/vakyansh - wav2vec2 - tamil - tam - 250")
model.to("cuda")

resampler = torchaudio.transforms.Resample(48_000, 16_000)

chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]'

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
  batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
  speech_array, sampling_rate = torchaudio.load(batch["path"])
  batch["speech"] = resampler(speech_array).squeeze().numpy()
  return batch

test_dataset = test_dataset.map(speech_file_to_array_fn)

# Preprocessing the datasets.
# We need to read the aduio files as arrays
def evaluate(batch):
  inputs = processor(batch["speech"], sampling_rate = 16_000, return_tensors = "pt", padding = True)

  with torch.no_grad():
      logits = model(inputs.input_values.to("cuda")).logits

      pred_ids = torch.argmax(logits, dim=-1)
      batch["pred_strings"] = processor.batch_decode(pred_ids, skip_special_tokens = True)
      return batch

result = test_dataset.map(evaluate, batched = True, batch_size = 8)

print("WER: {:2f}".format(100 * wer.compute(predictions = result["pred_strings"], references = result["sentence"])))

Test Result: 53.64 %

You can also check the Colab Evaluation.

🔧 Technical Details

The model is fine - tuned on a Multilingual Pretrained Model CLSRIL - 23. It uses a Wav2Vec2 architecture for speech recognition.

📄 License

This model is licensed under the MIT license.

📦 Model Information

Property	Details
Model Type	Wav2Vec2 Vakyansh Tamil Model
Training Data	4200 hours of Hindi Labelled Data (not publicly available)
Metrics	Word Error Rate (WER)
Tags	audio, automatic - speech - recognition, speech
Model Name	Wav2Vec2 Vakyansh Tamil Model by Harveen Chadha
Results Dataset	Common Voice ta
Test WER	53.64

👏 Credits

Thanks to the Ekstep Foundation for making this possible. The Vakyansh team aims to open - source speech models in all Indic Languages.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご