DENTAL_CLICK_classifier Open-source Speech Recognition Model - Accurately Identify Dental Clicks in Speech

Home

DENTAL CLICK Classifier

Developed by JBJoyce

A speech recognition model based on Wav2vec2 architecture, specifically designed to identify alveolar clicks in speech.

Speech Recognition

Transformers

English#Single Speaker Adaptation #High Precision Alveolar Click Detection #Short Audio Processing

Downloads 24

Release Time : 3/18/2023

Model Overview

This model employs the Wav2vec2 architecture trained on the Superb dataset, tailored for keyword recognition tasks and fine-tuned to detect alveolar clicks in speech.

Model Features

High Accuracy

Achieves 97% accuracy on the held-out test set.

Task-Specific Optimization

Fine-tuned specifically for alveolar click recognition tasks.

Single-Speaker Training Data

The model was trained for 10 epochs on limited-duration (approximately 1.5 hours) single-speaker audio data.

Model Capabilities

Speech Recognition

Keyword Recognition

Alveolar Click Detection

Use Cases

Speech Analysis

Alveolar Click Detection

Identify alveolar clicks in speech.

97% accuracy

🚀 Voice Model for Dental Click Identification

This model uses the Wav2vec2 architecture to identify dental click utterances in speech, offering high accuracy on a limited dataset.

🚀 Quick Start

This model can be used via the transformers library or the Hugging Face Hosted inference API.

⚠️ Important Note

Do not use the 'Record from browser' option as the model may misidentify mouse clicks as speech utterances. Audio files for upload should be 1 second in length, in 'WAV' format, and use 16-bit signed integer PCM encoding.

✨ Features

Specific Task: Trained for the keyword spotting task, specifically to identify dental click utterances in speech.
Limited Training: Trained on a limited quantity of speech (~1.5 hours) from only one speaker.
High Accuracy: Achieved 97% accuracy on a 20% hold-out test set.

📦 Installation

No specific installation steps are provided in the original README.

💻 Usage Examples

Basic Usage

# Assume you have the transformers library installed
from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
import torch

model_name = "your_model_name"  # Replace with the actual model name
model = AutoModelForAudioClassification.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)

# Load an audio file (assuming it meets the requirements)
audio_path = "your_audio_file.wav"
inputs = feature_extractor(audio_path, return_tensors="pt")

# Make a prediction
with torch.no_grad():
    logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
predicted_label = model.config.id2label[predicted_class_id]
print(f"Predicted label: {predicted_label}")

Advanced Usage

# For more advanced usage, you can fine - tune the model further
from transformers import TrainingArguments, Trainer

# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

# Prepare your dataset (this is a simplified example)
# Assume you have a dataset in the right format
train_dataset = ...
eval_dataset = ...

# Create a Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)

# Fine - tune the model
trainer.train()

📚 Documentation

Model Description

The model utilizes the Wav2vec2 architecture trained on the Superb dataset for the keyword spotting task. It was fine - tuned to identify dental click utterances (https://en.wikipedia.org/wiki/Dental_click) in speech.

The model was trained for 10 epochs on a limited quantity of speech (~1.5 hours) and with only one speaker. Therefore, it should not be assumed to be generalizable to other speakers or languages without further training data or rigorous testing.

The model was evaluated for accuracy on a hold - out test set of 20% of the available data and scored 97%.

🔧 Technical Details

The model is based on the Wav2vec2 architecture. It was trained on the Superb dataset and then fine - tuned for the specific task of identifying dental click utterances. The limited training data (both in quantity and the number of speakers) may affect its generalizability.

📄 License

No license information is provided in the original README.

Property	Details
Model Type	Utilizes Wav2vec2 architecture for keyword spotting and dental click identification
Training Data	Superb dataset, limited speech data (~1.5 hours) from one speaker

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご