🚀 🙊 Detoxify
Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers
This project provides trained models and code to predict toxic comments in three Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, and Multilingual toxic comment classification.
🚀 Quick Start
Installation
pip install detoxify
Quick Prediction
from detoxify import Detoxify
results = Detoxify('original').predict('example text')
results = Detoxify('unbiased').predict(['example text 1', 'example text 2'])
results = Detoxify('multilingual').predict(['example text', 'exemple de texte', 'texto de ejemplo', 'testo di esempio', 'texto de ejemplo', 'örnek metin', 'пример текста'])
import pandas as pd
print(pd.DataFrame(results, index=input_text).round(5))
✨ Features
Model Capabilities
- Trained on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.
- The
multilingual
model is trained on 7 languages: english
, french
, spanish
, italian
, portuguese
, turkish
, russian
.
Dependencies
- For inference:
- 🤗 Transformers
- ⚡ Pytorch lightning
- For training:
- Kaggle API (to download data)
📦 Installation Guide
Clone the Project
git clone https://github.com/unitaryai/detoxify
python3 -m venv toxic-env
source toxic-env/bin/activate
pip install -e detoxify
cd detoxify
pip install -r requirements.txt
💻 Usage Examples
Basic Usage
from detoxify import Detoxify
results = Detoxify('original').predict('example text')
Advanced Usage
from detoxify import Detoxify
results = Detoxify('multilingual').predict(['example text', 'exemple de texte', 'texto de ejemplo', 'testo di esempio', 'texto de ejemplo', 'örnek metin', 'пример текста'])
import pandas as pd
print(pd.DataFrame(results, index=input_text).round(5))
📚 Documentation
Model Details
Property |
Details |
Model Type |
original : bert-base-uncased ; unbiased : roberta-base ; multilingual : xlm-roberta-base |
Training Data |
Toxic Comment Classification Challenge, Unintended Bias in Toxicity Classification, Multilingual Toxic Comment Classification |
Labels
All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according to the following schema:
- Very Toxic: A very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective.
- Toxic: A rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective.
- Hard to Say
- Not Toxic
More information about the labelling schema can be found here.
Prediction
python run_prediction.py --input 'example' --model_name original
python run_prediction.py --input 'example' --from_ckpt_path model_path
python run_prediction.py --input test_set.txt --model_name original --save_to results.csv
python run_prediction.py --help
Training
mkdir jigsaw_data
cd jigsaw_data
kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification
kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification
python create_val_set.py
python train.py --config configs/Toxic_comment_classification_BERT.json
python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json
python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json
python train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json
tensorboard --logdir=./saved
Model Evaluation
python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
python model_eval/compute_bias_metric.py
python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
🔧 Technical Details
Limitations and Ethical Considerations
If words associated with swearing, insults, or profanity are present in a comment, it is likely to be classified as toxic, regardless of the author's tone or intent (e.g., humorous/self - deprecating). This could present biases towards already vulnerable minority groups.
Some useful resources about the risk of different biases in toxicity or hate speech detection are:
📄 License
This project is licensed under the apache-2.0
license.
Citation
@misc{Detoxify,
title={Detoxify},
author={Hanu, Laura and {Unitary team}},
howpublished={Github. https://github.com/unitaryai/detoxify},
year={2020}
}
Disclaimer
⚠️ Important Note
The huggingface models currently give different results to the detoxify library (see issue here). For the most up to date models we recommend using the models from https://github.com/unitaryai/detoxify