Open-source Voice Safety Classifier Model - Free Deployment to Identify Toxic Content in Voice Chats

Voice Safety Classifier

Developed by Roblox

A voice content safety detection model based on WavLM base plus architecture, used to identify toxic content in voice chats

Audio Classification

Transformers

#Multi-label voice classification #Toxic content detection #Real-time voice analysis

Downloads 11.55k

Release Time : 6/28/2024

Model Overview

This model is a large-scale classification model specifically designed to detect toxic content in voice chat audio, including profanity, explicit content, racial discrimination, bullying, and other types of violations.

Model Features

Multi-label classification

Capable of detecting multiple types of violations in voice content simultaneously, including profanity, explicit content, racial discrimination, and bullying.

High accuracy

Achieves an average accuracy of 94.48% on manually annotated datasets.

Large-scale training data

Fine-tuned using 2,374 hours of voice chat audio clips.

Model Capabilities

Voice content classification

Toxic content detection

Multi-label prediction

Use Cases

Content safety

Voice chat monitoring

Real-time monitoring of inappropriate content in voice chat platforms

Effectively identifies various types of inappropriate voice content.

Community management

Automatically flags potentially harmful voice content for manual review

Reduces manual review workload and improves moderation efficiency.

🚀 Voice Safety Classifier

A large classification model trained on real - world data for toxicity detection and classification.

🚀 Quick Start

This model can serve as a new benchmark for advancing research in toxicity detection and classification. It starts with the original weights from WavLM base plus and is fine - tuned for multilabel classification.

✨ Features

Multilabel Classification: A single output can have multiple labels. The model outputs an n by 6 output tensor with inferred labels such as Profanity, DatingAndSexting, Racist, Bullying, Other, NoViolation.
Automated Labeling: Audio clips are automatically labeled using a synthetic data pipeline described in our blog post.
High Precision: When treated as a binary classifier across 4 toxicity classes, it achieves a binarized average precision of 94.48%.

📦 Installation

The dependencies for the inference file can be installed as follows:

pip install -r requirements.txt

💻 Usage Examples

Basic Usage

The inference file contains useful helper functions to preprocess the audio file for proper inference. To run the inference file, please run the following command:

python inference.py --audio_file <your audio file path> --model_path <path to Huggingface model>

You can get the model weights either by downloading from the model releases page here, or from HuggingFace under roblox/voice-safety-classifier. If model_path isn’t specified, the model will be loaded directly from HuggingFace.

📚 Documentation

Model Training

We started with the original weights from the WavLM base plus and fine - tuned it with 2,374 hours of voice chat audio clips for multilabel classification.

Model Evaluation

We evaluated this model on a data set with human - annotated labels that contained a total of 9,795 samples. The class distribution is shown in the table below. Note that we did not include the "other" category in this evaluation data set.

Class	Number of examples	Duration (hours)	% of dataset
Profanity	4893	15.38	49.95%
DatingAndSexting	688	2.52	7.02%
Racist	889	3.10	9.08%
Bullying	1256	4.25	12.82%
NoViolation	4185	9.93	42.73%

If we set the same threshold across all classes and treat the model as a binary classifier across all 4 toxicity classes (Profanity, DatingAndSexting, Racist, Bullying), we get a binarized average precision of 94.48%. The precision - recall curve is as shown below.

PR Curve

📄 License

This project is licensed under the CC BY - SA 3.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご