The open-source toxic comment classification model, toxic - comment - model, can be freely deployed to identify harmful online comments.

Home

Toxic Comment Model

Developed by martin-ha

A toxic comment classification model fine-tuned on DistilBERT, used to identify harmful content in online comments

Text Classification

Transformers

English#Toxic comment detection #DistilBERT fine-tuning #Social content moderation

Downloads 238.97k

Release Time : 3/2/2022

Model Overview

This model is a specialized classifier for detecting toxic content in online comments, fine-tuned on the DistilBERT architecture, capable of identifying harmful speech in text.

Model Features

Efficient and lightweight

Based on the DistilBERT architecture, it reduces model size and computational requirements while maintaining high performance

Toxicity detection

Specially optimized for identifying harmful content in online comments

Fast inference

Trained in just 3 hours on a P100 GPU, suitable for real-time applications

Model Capabilities

Text classification

Toxic content detection

Natural language processing

Use Cases

Content moderation

Social media comment moderation

Automatically identify harmful comments on social media platforms

94% accuracy on test set

Forum content filtering

Help forum administrators filter inappropriate content

🚀 Toxic Comment Classification Model

This model is a fine - tuned DistilBERT model designed to classify toxic comments, offering an effective solution for identifying toxic content in online text.

🚀 Quick Start

You can quickly start using this model with the following steps.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_path = "martin-ha/toxic-comment-model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)

pipeline =  TextClassificationPipeline(model=model, tokenizer=tokenizer)
print(pipeline('This is a test text.'))

📚 Documentation

Limitations and Bias

This model is intended to classify toxic online comments. However, it has a limitation: it performs poorly for some comments that mention specific identity subgroups, such as Muslims. The following table shows the evaluation scores for different identity groups. You can learn the specific meaning of these metrics here. Generally, these metrics indicate how well the model performs for a specific group. The larger the number, the better.

Property	Details
Model Type	Fine - tuned DistilBERT for toxic comment classification
Training Data	10% of the `train.csv` data from Kaggle competition

Subgroup	Subgroup Size	Subgroup AUC	BPSN AUC	BNSP AUC
muslim	108	0.689	0.811	0.88
jewish	40	0.749	0.86	0.825
homosexual_gay_or_lesbian	56	0.795	0.706	0.972
black	84	0.866	0.758	0.975
white	112	0.876	0.784	0.97
female	306	0.898	0.887	0.948
christian	231	0.904	0.917	0.93
male	225	0.922	0.862	0.967
psychiatric_or_mental_illness	26	0.924	0.907	0.95

The table above shows that the model performs poorly for the Muslim and Jewish groups. In fact, if you pass the sentence "Muslims are people who follow or practice Islam, an Abrahamic monotheistic religion." into the model, it will classify it as toxic. Be aware of this type of potential bias.

⚠️ Important Note

The model may have performance issues when dealing with comments related to specific identity subgroups.

Training Data

The training data comes from this Kaggle competition. We used 10% of the train.csv data to train the model.

Training Procedure

You can refer to this documentation and codes to understand how we trained the model. It takes about 3 hours on a P - 100 GPU.

Evaluation Results

The model achieves 94% accuracy and a 0.59 f1 - score on a 10000 - row held - out test set.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご