🚀 Wav2Vec2-base-VoxPopuli-V2
This is a base model of Facebook's Wav2Vec2, pretrained only in Swedish on 16.3k unlabeled data from the VoxPopuli corpus, aiming to provide powerful features for automatic speech recognition.
✨ Features
- Pretrained only in Swedish (sv) on 16.3k unlabeled data from the VoxPopuli corpus.
- Pretrained on 16kHz sampled speech audio.
📚 Documentation
General Information
The model is a base model of Facebook's Wav2Vec2, which has been pretrained only in Swedish (sv) on 16.3k unlabeled data from the VoxPopuli corpus. It is pretrained on 16kHz sampled speech audio. When using the model, ensure that your speech input is also sampled at 16kHz.
Notes
This model does not have a tokenizer as it was pretrained on audio alone. To use this model for speech recognition, a tokenizer should be created and the model should be fine - tuned on labeled text data in Swedish (sv). Check out this blog for a more in - detail explanation of how to fine - tune the model.
Paper
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Authors
Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI.
More Information
See the official website for more information, here.
📄 License
This model is released under the cc-by-nc-4.0
license.
Property |
Details |
Tags |
audio, automatic-speech-recognition, voxpopuli-v2 |
Datasets |
voxpopuli |
Model Type |
Wav2Vec2-base-VoxPopuli-V2 |
Training Data |
16.3k unlabeled data of the VoxPopuli corpus in Swedish |
License |
cc-by-nc-4.0 |
Inference |
false |
⚠️ Important Note
This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine - tuned on labeled text data in Swedish.
💡 Usage Tip
When using the model, make sure that your speech input is sampled at 16kHz as the model is pretrained on 16kHz sampled speech audio.