Hubert-base-ls960 Open-source Speech Model - Empowering Speech Recognition and Efficiently Learning Speech Features

Hubert Base Ls960

Developed by facebook

HuBERT is a self-supervised speech representation learning model that learns speech features through BERT-like prediction loss, suitable for tasks such as speech recognition.

Speech Recognition

Transformers

EnglishOpen Source License:Apache-2.0 #Self-supervised speech learning #BERT-style prediction loss #16kHz audio processing

Downloads 406.60k

Release Time : 3/2/2022

Model Overview

HuBERT (Hidden Unit BERT) is a self-supervised speech representation learning method that provides target labels for BERT-like prediction loss through an offline clustering step. The model is pre-trained on speech audio sampled at 16kHz and is suitable for tasks such as speech recognition, generation, and compression.

Model Features

Self-supervised learning

Provides target labels through unsupervised clustering steps, enabling speech representation learning without the need for large amounts of labeled data.

Efficient speech representation

Combines acoustic and language models on continuous input to learn efficient speech feature representations.

High performance

Outperforms or matches the state-of-the-art wav2vec 2.0 model on Librispeech and Libri-light benchmarks.

Model Capabilities

Speech representation learning

Speech recognition

Speech generation

Speech compression

Use Cases

Speech recognition

Automatic speech transcription

Converts speech audio into text, suitable for scenarios such as meeting minutes and subtitle generation.

Performs excellently on the Librispeech test set, reducing relative word error rate by 13-19%.

Speech generation

Speech synthesis

Combines with other models to generate natural speech.

🚀 Hubert-Base

A base model pretrained on 16kHz sampled speech audio. Ideal for speech-related tasks with proper fine - tuning.

🚀 Quick Start

The Hubert - Base model is a pre - trained model on 16kHz sampled speech audio. When using this model, ensure that your speech input is also sampled at 16kHz.

Note: This model doesn't have a tokenizer as it was pretrained on audio alone. To use this model for speech recognition, you need to create a tokenizer and fine - tune the model on labeled text data. Check out this blog for a more detailed explanation of how to fine - tune the model.

✨ Features

Self - supervised Learning: Addresses unique challenges in speech representation learning, such as multiple sound units, no pre - training lexicon, and variable sound unit lengths.
Effective Clustering: Utilizes an offline clustering step to provide aligned target labels for a BERT - like prediction loss.
High Performance: Either matches or improves upon the state - of - the - art wav2vec 2.0 performance on Librispeech and Libri - light benchmarks with different fine - tuning subsets.

📚 Documentation

📄 Paper

The research paper HuBERT: Self - supervised Speech Representation Learning by Masked Prediction of Hidden Units provides in - depth details about the model.

Authors: Wei - Ning Hsu, Benjamin Bolte, Yao - Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed
Abstract: Self - supervised approaches for speech representation learning face three unique problems. To tackle them, the Hidden - Unit BERT (HuBERT) approach is proposed. It uses an offline clustering step for a BERT - like prediction loss. By applying the prediction loss over masked regions, it forces the model to learn a combined acoustic and language model. HuBERT relies on the consistency of the unsupervised clustering. Starting with a simple k - means teacher of 100 clusters and two iterations of clustering, it shows excellent performance on benchmarks. A 1B parameter model can achieve up to 19% and 13% relative WER reduction on more challenging evaluation subsets.

🔗 Original Model

The original model can be found under https://github.com/pytorch/fairseq/tree/master/examples/hubert.

📦 Datasets and Tags

Datasets: librispeech_asr
Tags: speech

💻 Usage Examples

See this blog for more information on how to fine - tune the model. Note that the class Wav2Vec2ForCTC has to be replaced by HubertForCTC.

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご