Hubert-base-superb-ic open-source speech intent classification model - Accurately identify the intentions behind speech

Hubert Base Superb Ic

Developed by superb

A speech intent classification model fine-tuned on the SUPERB intent classification task, based on the Hubert-Base-LS960 pre-trained model

Audio Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Speech Intent Recognition #Multi-label Classification #16kHz Audio Processing

Downloads 578

Release Time : 3/2/2022

Model Overview

This model is used for speech intent classification, categorizing speech inputs into predefined intent categories, including action, object, and location labels.

Model Features

Based on Hubert Pre-trained Model

Uses hubert-base-ls960 as the base model, featuring robust speech feature extraction capabilities

Multi-label Classification

Can simultaneously recognize action, object, and location intent labels in speech

High Accuracy

Achieves 98.34% accuracy on the Fluent Speech Commands dataset

Model Capabilities

Speech Intent Recognition

Multi-label Classification

Speech Feature Extraction

Use Cases

Smart Home Control

Voice-Controlled Appliances

Recognizes user commands for controlling smart home appliances

Accurately identifies actions (e.g., turn on/off), objects (e.g., lights/AC), and locations (e.g., living room/bedroom)

Voice Assistants

Understanding User Intent

Helps voice assistants comprehend the core intent of user requests

Improves the accuracy and naturalness of voice assistant interactions

🚀 Hubert-Base for Intent Classification

A ported model for intent classification based on Hubert, leveraging the SUPERB dataset.

🚀 Quick Start

This README provides a detailed introduction to the Hubert-Base model for intent classification, including its description, task and dataset details, usage examples, evaluation results, and citation information.

✨ Features

Ported from S3PRL's Hubert for the SUPERB Intent Classification task.
Based on the hubert-base-ls960 model, pretrained on 16kHz sampled speech audio.
Applicable to the Intent Classification task using the Fluent Speech Commands dataset.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

import torch
import librosa
from datasets import load_dataset
from transformers import HubertForSequenceClassification, Wav2Vec2FeatureExtractor

def map_to_array(example):
    speech, _ = librosa.load(example["file"], sr=16000, mono=True)
    example["speech"] = speech
    return example

# load a demo dataset and read audio files
dataset = load_dataset("anton-l/superb_demo", "ic", split="test")
dataset = dataset.map(map_to_array)

model = HubertForSequenceClassification.from_pretrained("superb/hubert-base-superb-ic")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("superb/hubert-base-superb-ic")

# compute attention masks and normalize the waveform if needed
inputs = feature_extractor(dataset[:4]["speech"], sampling_rate=16000, padding=True, return_tensors="pt")

logits = model(**inputs).logits

action_ids = torch.argmax(logits[:, :6], dim=-1).tolist()
action_labels = [model.config.id2label[_id] for _id in action_ids]

object_ids = torch.argmax(logits[:, 6:20], dim=-1).tolist()
object_labels = [model.config.id2label[_id + 6] for _id in object_ids]

location_ids = torch.argmax(logits[:, 20:24], dim=-1).tolist()
location_labels = [model.config.id2label[_id + 20] for _id in location_ids]

📚 Documentation

Model description

This is a ported version of S3PRL's Hubert for the SUPERB Intent Classification task.

The base model is hubert-base-ls960, which is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

For more information refer to SUPERB: Speech processing Universal PERformance Benchmark

Task and dataset description

Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers. SUPERB uses the Fluent Speech Commands dataset, where each utterance is tagged with three intent labels: action, object, and location.

For the original model's training and evaluation instructions refer to the S3PRL downstream task README.

📄 License

This project is licensed under the Apache-2.0 license.

📊 Eval results

The evaluation metric is accuracy.

Property	Details
Model Type	Hubert-Base for Intent Classification
Training Data	SUPERB dataset (Fluent Speech Commands)

	s3prl	transformers
test	`0.9834`	`N/A`

BibTeX entry and citation info

@article{yang2021superb,
  title={SUPERB: Speech processing Universal PERformance Benchmark},
  author={Yang, Shu-wen and Chi, Po-Han and Chuang, Yung-Sung and Lai, Cheng-I Jeff and Lakhotia, Kushal and Lin, Yist Y and Liu, Andy T and Shi, Jiatong and Chang, Xuankai and Lin, Guan-Ting and others},
  journal={arXiv preprint arXiv:2105.01051},
  year={2021}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご