đ Hubert-Base for Intent Classification
A ported model for intent classification based on Hubert, leveraging the SUPERB dataset.
đ Quick Start
This README provides a detailed introduction to the Hubert-Base model for intent classification, including its description, task and dataset details, usage examples, evaluation results, and citation information.
⨠Features
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
import torch
import librosa
from datasets import load_dataset
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor
def map_to_array(example):
speech, _ = librosa.load(example["file"], sr=16000, mono=True)
example["speech"] = speech
return example
dataset = load_dataset("anton-l/superb_demo", "ic", split="test")
dataset = dataset.map(map_to_array)
model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-ic")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-ic")
inputs = feature_extractor(dataset[:4]["speech"], sampling_rate=16000, padding=True, return_tensors="pt")
logits = model(**inputs).logits
action_ids = torch.argmax(logits[:, :6], dim=-1).tolist()
action_labels = [model.config.id2label[_id] for _id in action_ids]
object_ids = torch.argmax(logits[:, 6:20], dim=-1).tolist()
object_labels = [model.config.id2label[_id + 6] for _id in object_ids]
location_ids = torch.argmax(logits[:, 20:24], dim=-1).tolist()
location_labels = [model.config.id2label[_id + 20] for _id in location_ids]
đ Documentation
Model description
This is a ported version of S3PRL's Hubert for the SUPERB Intent Classification task.
The base model is hubert-base-ls960, which is pretrained on 16kHz
sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
For more information refer to SUPERB: Speech processing Universal PERformance Benchmark
Task and dataset description
Intent Classification (IC) classifies utterances into predefined classes to determine the intent of
speakers. SUPERB uses the
Fluent Speech Commands
dataset, where each utterance is tagged with three intent labels: action, object, and location.
For the original model's training and evaluation instructions refer to the
S3PRL downstream task README.
đ License
This project is licensed under the Apache-2.0 license.
đ Eval results
The evaluation metric is accuracy.
Property |
Details |
Model Type |
Hubert-Base for Intent Classification |
Training Data |
SUPERB dataset (Fluent Speech Commands) |
|
s3prl |
transformers |
test |
0.9834 |
N/A |
BibTeX entry and citation info
@article{yang2021superb,
title={SUPERB: Speech processing Universal PERformance Benchmark},
author={Yang, Shu-wen and Chi, Po-Han and Chuang, Yung-Sung and Lai, Cheng-I Jeff and Lakhotia, Kushal and Lin, Yist Y and Liu, Andy T and Shi, Jiatong and Chang, Xuankai and Lin, Guan-Ting and others},
journal={arXiv preprint arXiv:2105.01051},
year={2021}
}