🚀 Model Card for mhubert-base-25hz
This is a version of Hubert by Meta, introduced in TWIST, which shows great value as a speech tokeniser for training SpeechLMs.
🚀 Quick Start
This model requires a new version of transformers - transformers>=4.48
. Make sure you have it installed. Then, you can use the model as follows:
from transformers import HubertModel
model = HubertModel.from_pretrained('slprl/mhubert-base-25hz')
✨ Features
📚 Documentation
Model Details
Model Description
This Hubert model was introduced in TWIST. We encourage you to refer to it for comprehensive details.
It was trained on a diverse mixture of datasets: Multilingual LS, Vox Populi, Common Voice, Spotify, and Fisher. This Hubert base model was trained for 3 iterations with the default 50Hz features rate. For the 4 - th iteration, an additional convolutional layer was added at the CNN Encoder with a stride of 2, resulting in features of 25Hz.
We converted the original Fairseq release to Huggingface🤗 using the conversion script after adding support and verified that the results are identical.
Property |
Details |
Developed by |
Hassid et. al |
Shared by |
SLP - RL |
Model Type |
transformers.HubertModel |
Languages |
Multi - lingual |
License |
MIT, see textlesslib license for full details |
Model Sources
- Repository: https://github.com/facebookresearch/textlesslib/tree/main/examples/twist
- Paper: https://arxiv.org/abs/2305.13009
📄 License
The model is under the MIT license. See textlesslib license for full details.
📚 Citation
BibTeX:
@article{hassid2024textually,
title={Textually pretrained speech language models},
author={Hassid, Michael and Remez, Tal and Nguyen, Tu Anh and Gat, Itai and Conneau, Alexis and Kreuk, Felix and Copet, Jade and Defossez, Alexandre and Synnaeve, Gabriel and Dupoux, Emmanuel and others},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}
👥 Model Card Authors
Gallil Maimon