🚀 Toxic Comment Classification Model
This model is a fine - tuned DistilBERT model designed to classify toxic comments, offering an effective solution for identifying toxic content in online text.
🚀 Quick Start
You can quickly start using this model with the following steps.
💻 Usage Examples
Basic Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
model_path = "martin-ha/toxic-comment-model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer)
print(pipeline('This is a test text.'))
📚 Documentation
Limitations and Bias
This model is intended to classify toxic online comments. However, it has a limitation: it performs poorly for some comments that mention specific identity subgroups, such as Muslims. The following table shows the evaluation scores for different identity groups. You can learn the specific meaning of these metrics here. Generally, these metrics indicate how well the model performs for a specific group. The larger the number, the better.
Property |
Details |
Model Type |
Fine - tuned DistilBERT for toxic comment classification |
Training Data |
10% of the train.csv data from Kaggle competition |
Subgroup |
Subgroup Size |
Subgroup AUC |
BPSN AUC |
BNSP AUC |
muslim |
108 |
0.689 |
0.811 |
0.88 |
jewish |
40 |
0.749 |
0.86 |
0.825 |
homosexual_gay_or_lesbian |
56 |
0.795 |
0.706 |
0.972 |
black |
84 |
0.866 |
0.758 |
0.975 |
white |
112 |
0.876 |
0.784 |
0.97 |
female |
306 |
0.898 |
0.887 |
0.948 |
christian |
231 |
0.904 |
0.917 |
0.93 |
male |
225 |
0.922 |
0.862 |
0.967 |
psychiatric_or_mental_illness |
26 |
0.924 |
0.907 |
0.95 |
The table above shows that the model performs poorly for the Muslim and Jewish groups. In fact, if you pass the sentence "Muslims are people who follow or practice Islam, an Abrahamic monotheistic religion." into the model, it will classify it as toxic. Be aware of this type of potential bias.
⚠️ Important Note
The model may have performance issues when dealing with comments related to specific identity subgroups.
Training Data
The training data comes from this Kaggle competition. We used 10% of the train.csv
data to train the model.
Training Procedure
You can refer to this documentation and codes to understand how we trained the model. It takes about 3 hours on a P - 100 GPU.
Evaluation Results
The model achieves 94% accuracy and a 0.59 f1 - score on a 10000 - row held - out test set.