wav2vec2-base-10k-voxpopuli-ft-hr Open-source Speech Recognition Model

Wav2vec2 Base 10k Voxpopuli Ft Hr

Developed by facebook

A speech recognition model based on Facebook's Wav2Vec2 architecture, pretrained on the VoxPopuli corpus and fine-tuned on Croatian data

Speech Recognition

Transformers

Other#Croatian speech recognition #VoxPopuli pretraining #Multilingual speech processing

Downloads 20

Release Time : 3/2/2022

Model Overview

This is an automatic speech recognition (ASR) model specifically optimized for Croatian, capable of converting speech to text

Model Features

Multi-stage training

Pretrained on large-scale unlabeled data first, then fine-tuned on labeled Croatian data

Efficient representation learning

Uses Wav2Vec2 architecture to learn effective speech representations directly from raw audio

Language-specific optimization

Specifically optimized for Croatian to improve recognition accuracy for this language

Model Capabilities

Croatian speech recognition

Audio-to-text conversion

Speech transcription

Use Cases

Speech transcription

Croatian speech transcription

Convert Croatian speech content into text format

Voice assistants

Croatian voice command recognition

Used for voice assistants and smart devices supporting Croatian

🚀 Wav2Vec2-Base-VoxPopuli-Finetuned

This is a fine - tuned base model of Facebook's Wav2Vec2. It was pretrained on the 10K unlabeled subset of VoxPopuli corpus and fine - tuned on the transcribed data in Croatian (for more information, refer to Table 1 of the paper).

🚀 Quick Start

This model is designed for automatic speech recognition in Croatian. It leverages the power of Wav2Vec2 and the VoxPopuli corpus to provide accurate transcription results.

✨ Features

Audio Processing: Specialized for audio data and automatic speech recognition.
Multilingual Adaptability: Based on the VoxPopuli corpus, which supports multilingual speech processing.
Fine - Tuned for Croatian: Optimized for the Croatian language, providing high - quality transcription.

📦 Installation

The code example uses Python libraries such as transformers, datasets, and torchaudio. You can install them using the following command:

pip install transformers datasets torchaudio torch

💻 Usage Examples

Basic Usage

#!/usr/bin/env python3
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torchaudio
import torch

# resample audio

# load model & processor
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-hr")
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-10k-voxpopuli-ft-hr")

# load dataset
ds = load_dataset("common_voice", "hr", split="validation[:1%]")

# common voice does not match target sampling rate
common_voice_sample_rate = 48000
target_sample_rate = 16000

resampler = torchaudio.transforms.Resample(common_voice_sample_rate, target_sample_rate)


# define mapping fn to read in sound file and resample
def map_to_array(batch):
    speech, _ = torchaudio.load(batch["path"])
    speech = resampler(speech)
    batch["speech"] = speech[0]
    return batch


# load all audio files
ds = ds.map(map_to_array)

# run inference on the first 5 data samples
inputs = processor(ds[:5]["speech"], sampling_rate=target_sample_rate, return_tensors="pt", padding=True)

# inference
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, axis=-1)

print(processor.batch_decode(predicted_ids))

📚 Documentation

Paper: VoxPopuli: A Large - Scale Multilingual Speech Corpus for Representation Learning, Semi - Supervised Learning and Interpretation Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI

For more information, please visit the official website: here

📄 License

This model is released under the CC - BY - NC - 4.0 license.

Property	Details
Tags	audio, automatic - speech - recognition, voxpopuli
License	cc - by - nc - 4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご