Open-source Toxic Comment Classification Model for Multilingual - Detecting Multiple Types of Toxic Content

Multilingual Toxic Xlm Roberta

Developed by unitary

Toxic comment classification system based on PyTorch Lightning and Hugging Face Transformers, capable of detecting various types of toxic content

Text Classification

Transformers

Open Source License:Apache-2.0 #Multilingual Toxicity Detection #Text Content Moderation #BERT/RoBERTa Architecture

Downloads 998

Release Time : 3/2/2022

Model Overview

Detoxify is a collection of pre-trained models for detecting toxic content in text, including threats, obscenity, insults, and identity-based hate. It is trained on three Jigsaw competition datasets and is suitable for content moderation and research purposes.

Model Features

Multi-task Toxicity Detection

Can simultaneously detect multiple types of toxicity, including threats, obscenity, insults, and identity-based hate

Multilingual Support

Supports toxicity detection in 7 languages, including English and several European languages

Bias Mitigation

Specifically focuses on reducing bias detection against certain identity groups

Easy to Use

Provides a simple API interface, requiring only a few lines of code to implement toxicity detection

Model Capabilities

Text Toxicity Classification

Multilingual Text Analysis

Content Safety Detection

Hate Speech Identification

Use Cases

Content Moderation

Social Media Comment Filtering

Automatically identifies and flags toxic comments on social media platforms

Helps moderators quickly locate harmful content

Forum Content Management

Detects insulting and hate speech in online forums

Reduces manual moderation workload

Academic Research

Toxic Language Analysis

Studies patterns of toxicity in online discourse

Provides quantitative analysis tools

🚀 🙊 Detoxify

Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers

This project provides trained models and code to predict toxic comments in three Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, and Multilingual toxic comment classification.

🚀 Quick Start

Installation

# install detoxify
pip install detoxify

Quick Prediction

from detoxify import Detoxify

# each model takes in either a string or a list of strings
results = Detoxify('original').predict('example text')
results = Detoxify('unbiased').predict(['example text 1', 'example text 2'])
results = Detoxify('multilingual').predict(['example text', 'exemple de texte', 'texto de ejemplo', 'testo di esempio', 'texto de ejemplo', 'örnek metin', 'пример текста'])

# optional to display results nicely (will need to pip install pandas)
import pandas as pd
print(pd.DataFrame(results, index=input_text).round(5))

✨ Features

Model Capabilities

Trained on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.
The multilingual model is trained on 7 languages: english, french, spanish, italian, portuguese, turkish, russian.

Dependencies

For inference:
- 🤗 Transformers
- ⚡ Pytorch lightning
For training:
- Kaggle API (to download data)

📦 Installation Guide

Clone the Project

# clone project
git clone https://github.com/unitaryai/detoxify

# create virtual env
python3 -m venv toxic-env
source toxic-env/bin/activate

# install project
pip install -e detoxify
cd detoxify

# for training
pip install -r requirements.txt

💻 Usage Examples

Basic Usage

from detoxify import Detoxify
results = Detoxify('original').predict('example text')

Advanced Usage

from detoxify import Detoxify
results = Detoxify('multilingual').predict(['example text', 'exemple de texte', 'texto de ejemplo', 'testo di esempio', 'texto de ejemplo', 'örnek metin', 'пример текста'])
import pandas as pd
print(pd.DataFrame(results, index=input_text).round(5))

📚 Documentation

Model Details

Property	Details
Model Type	`original`: `bert-base-uncased`; `unbiased`: `roberta-base`; `multilingual`: `xlm-roberta-base`
Training Data	Toxic Comment Classification Challenge, Unintended Bias in Toxicity Classification, Multilingual Toxic Comment Classification

Labels

All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according to the following schema:

Very Toxic: A very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective.
Toxic: A rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective.
Hard to Say
Not Toxic

More information about the labelling schema can be found here.

Prediction

# load model via torch.hub
python run_prediction.py --input 'example' --model_name original

# load model from from checkpoint path
python run_prediction.py --input 'example' --from_ckpt_path model_path

# save results to a .csv file
python run_prediction.py --input test_set.txt --model_name original --save_to results.csv

# to see usage
python run_prediction.py --help

Training

# create data directory
mkdir jigsaw_data
cd jigsaw_data

# download data
kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification
kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification

# start training for different challenges
# Toxic Comment Classification Challenge
python create_val_set.py
python train.py --config configs/Toxic_comment_classification_BERT.json

# Unintended Bias in Toxicicity Challenge
python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json

# Multilingual Toxic Comment Classification
# stage 1
python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json
# stage 2
python train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json

# Monitor progress with tensorboard
tensorboard --logdir=./saved

Model Evaluation

# Toxic Comment Classification Challenge
python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv

# Unintended Bias in Toxicicity Challenge
python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
# to get the final bias metric
python model_eval/compute_bias_metric.py

# Multilingual Toxic Comment Classification
python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv

🔧 Technical Details

Limitations and Ethical Considerations

If words associated with swearing, insults, or profanity are present in a comment, it is likely to be classified as toxic, regardless of the author's tone or intent (e.g., humorous/self - deprecating). This could present biases towards already vulnerable minority groups.

Some useful resources about the risk of different biases in toxicity or hate speech detection are:

📄 License

This project is licensed under the apache-2.0 license.

Citation

@misc{Detoxify,
  title={Detoxify},
  author={Hanu, Laura and {Unitary team}},
  howpublished={Github. https://github.com/unitaryai/detoxify},
  year={2020}
}

Disclaimer

⚠️ Important Note

The huggingface models currently give different results to the detoxify library (see issue here). For the most up to date models we recommend using the models from https://github.com/unitaryai/detoxify

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご