wav2vec2-large-xlsr-tamil Open Source Model - Free Implementation of Automatic Tamil Speech Recognition

Wav2vec2 Large Xlsr Tamil

Developed by Thanish

Tamil automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53

Speech Recognition OtherOpen Source License:Apache-2.0 #Tamil speech recognition #XLSR fine-tuned model #No language model dependency

Downloads 86

Release Time : 3/2/2022

Model Overview

This model is an automatic speech recognition system specifically optimized for Tamil, fine-tuned on the Common Voice dataset, suitable for Tamil speech-to-text tasks.

Model Features

Tamil optimization

Specially fine-tuned for Tamil to improve recognition accuracy for this language

16kHz sampling rate support

Supports 16kHz sampled audio input, meeting common speech recognition requirements

No language model required

Can be used directly without additional language model support

Model Capabilities

Tamil speech recognition

Speech-to-text

Automatic speech recognition

Use Cases

Speech transcription

Tamil speech transcription

Convert Tamil speech content into text

Voice assistants

Tamil voice interaction

Provides speech recognition capability for Tamil voice assistants

🚀 Wav2Vec2-Large-XLSR-53-Tamil

This model is fine-tuned from facebook/wav2vec2-large-xlsr-53 on Tamil using the Common Voice dataset. It's designed for speech recognition tasks in Tamil.

📋 Model Information

Property	Details
Model Type	thanish wav2vec2-large-xlsr-tamil
Training Data	Common Voice (train, validation)
License	apache-2.0
Metrics	WER (Test WER: 100.00 on Common Voice ta)

⚠️ Important Note

When using this model, make sure that your speech input is sampled at 16kHz.

🚀 Quick Start

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the Tamil language, leveraging the Common Voice dataset.

✨ Features

Fine-tuned on Tamil language data from Common Voice.
Suitable for automatic speech recognition tasks in Tamil.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

The model can be used directly (without a language model) as follows:

import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
test_dataset = load_dataset("common_voice", "{lang_id}", split="test[:2%]") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
resampler = torchaudio.transforms.Resample(48_000, 16_000)
# Preprocessing the datasets.
# We need to read the aduio files as arrays
def speech_file_to_array_fn(batch):
    speech_array, sampling_rate = torchaudio.load(batch["path"])
    batch["speech"] = resampler(speech_array).squeeze().numpy()
    return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)
with torch.no_grad():
    logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
predicted_ids = torch.argmax(logits, dim=-1)
print("Prediction:", processor.batch_decode(predicted_ids))
print("Reference:", test_dataset["sentence"][:2])

Advanced Usage

The model can be evaluated as follows on the Tamil test data of Common Voice.

import torch
import torchaudio
from datasets import load_dataset, load_metric
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import re
test_dataset = load_dataset("common_voice", "{lang_id}", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
wer = load_metric("wer")
processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
model.to("cuda")
chars_to_ignore_regex = '[\\\\,\\\\?\\\\.\\\\!\\\\-\\\\;\\\\:\\\\"\\\\“]'  # TODO: adapt this list to include all special characters you removed from the data
resampler = torchaudio.transforms.Resample(48_000, 16_000)
# Preprocessing the datasets.
# We need to read the audio files as arrays
def speech_file_to_array_fn(batch):
    batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
    speech_array, sampling_rate = torchaudio.load(batch["path"])
    batch["speech"] = resampler(speech_array).squeeze().numpy()
    return batch
test_dataset = test_dataset.map(speech_file_to_array_fn)
# Preprocessing the datasets.
# We need to read the aduio files as arrays
def evaluate(batch):
    inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
    pred_ids = torch.argmax(logits, dim=-1)
    batch["pred_strings"] = processor.batch_decode(pred_ids)
    return batch
result = test_dataset.map(evaluate, batched=True, batch_size=8)
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))

Test Result

Test Result: 100.00 %

🔧 Technical Details

The Common Voice train, validation were used for training. The script used for training can be found https://colab.research.google.com/drive/1PC2SjxpcWMQ2qmRw21NbP38wtQQUa5os#scrollTo=YKBZdqqJG9Tv

📄 License

This model is released under the apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご