The wav2vec2-base-sv-voxpopuli-v2 open-source speech model - accurately identify Swedish speech content

Wav2vec2 Base Sv Voxpopuli V2

Developed by facebook

A speech model based on Facebook's Wav2Vec2 architecture, specifically pre-trained for Swedish using 16.3k hours of unlabeled data from the VoxPopuli corpus.

Speech Recognition

Transformers

Other#Swedish speech recognition #Unsupervised pre-training #16kHz audio processing

Downloads 30

Release Time : 3/2/2022

Model Overview

This is a foundational speech recognition model focused on Swedish language processing, suitable for speech-to-text tasks.

Model Features

Specialized for Swedish

Pre-trained specifically for Swedish, optimizing speech recognition performance for the language.

Based on VoxPopuli Corpus

Trained with 16.3k hours of Swedish data from the VoxPopuli corpus.

16kHz Audio Support

Optimized for 16kHz sampled speech audio; ensure input audio matches this sampling rate.

Model Capabilities

Swedish speech recognition

Speech feature extraction

Use Cases

Speech-to-Text

Swedish Speech Transcription

Convert Swedish speech content into text

Speech Analysis

Swedish Speech Feature Analysis

Extract feature representations from Swedish speech

🚀 Wav2Vec2-base-VoxPopuli-V2

This is a base model of Facebook's Wav2Vec2, pretrained only in Swedish on 16.3k unlabeled data from the VoxPopuli corpus, aiming to provide powerful features for automatic speech recognition.

✨ Features

Pretrained only in Swedish (sv) on 16.3k unlabeled data from the VoxPopuli corpus.
Pretrained on 16kHz sampled speech audio.

📚 Documentation

General Information

The model is a base model of Facebook's Wav2Vec2, which has been pretrained only in Swedish (sv) on 16.3k unlabeled data from the VoxPopuli corpus. It is pretrained on 16kHz sampled speech audio. When using the model, ensure that your speech input is also sampled at 16kHz.

Notes

This model does not have a tokenizer as it was pretrained on audio alone. To use this model for speech recognition, a tokenizer should be created and the model should be fine - tuned on labeled text data in Swedish (sv). Check out this blog for a more in - detail explanation of how to fine - tune the model.

Paper

VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Authors

Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI.

More Information

See the official website for more information, here.

📄 License

This model is released under the cc-by-nc-4.0 license.

Property	Details
Tags	audio, automatic-speech-recognition, voxpopuli-v2
Datasets	voxpopuli
Model Type	Wav2Vec2-base-VoxPopuli-V2
Training Data	16.3k unlabeled data of the VoxPopuli corpus in Swedish
License	cc-by-nc-4.0
Inference	false

⚠️ Important Note

This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine - tuned on labeled text data in Swedish.

💡 Usage Tip

When using the model, make sure that your speech input is sampled at 16kHz as the model is pretrained on 16kHz sampled speech audio.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご