bert-base-uncased-hatexplain-rationale-two Open Source Text Classification Model

Bert Base Uncased Hatexplain Rationale Two

Developed by Hate-speech-CNERG

A BERT-based text classification model for detecting hate speech and offensive content, with rationale prediction capability

Text Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Hate speech detection #Interpretability analysis #Social media content moderation

Downloads 523

Release Time : 3/2/2022

Model Overview

This model classifies text as either 'hate speech' or 'normal content', trained on data from Gab and Twitter platforms, enhanced with human-annotated rationales for improved performance

Model Features

Rationale prediction

Predicts rationale segments for offensive statements, enhancing model interpretability

Multi-platform training data

Trained on data from both Gab and Twitter platforms, covering diverse online language expressions

Human annotation enhancement

Incorporates human-annotated rationales during training to improve classification accuracy

Model Capabilities

Text classification

Hate speech detection

Offensive content identification

Interpretability analysis

Use Cases

Content moderation

Social media content filtering

Automatically identifies and flags hate speech and offensive content on platforms

Reduces manual moderation workload

Online community management

Assists community administrators in quickly locating policy-violating content

Improves community management efficiency

Academic research

Hate speech analysis

Used for linguistic or sociological studies on hate speech patterns

Provides quantitative analysis tools

🚀 BERT-base-uncased-hatexplain-rationale-two

This model is designed for text classification, specifically classifying text as Abusive (Hatespeech and Offensive) or Normal. It utilizes data from Gab and Twitter, and incorporates Human Rationales to enhance performance.

🚀 Quick Start

Details of usage

Please use the Model_Rational_Label class inside models.py to load the models. The default prediction in this hosted inference API may be wrong due to the use of different class initialisations.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
### from models.py
from models import *
tokenizer = AutoTokenizer.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
model = Model_Rational_Label.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
inputs = tokenizer('He is a great guy", return_tensors="pt")
prediction_logits, _ = model(input_ids=inputs['input_ids'],attention_mask=inputs['attention_mask'])

✨ Features

The model is used for classifying a text as Abusive (Hatespeech and Offensive) or Normal.
It is trained using data from Gab and Twitter, and Human Rationales are included as part of the training data to boost the performance.
The model also has a rationale predictor head that can predict the rationales given an abusive sentence.

📦 Installation

No specific installation steps provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
### from models.py
from models import *
tokenizer = AutoTokenizer.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
model = Model_Rational_Label.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
inputs = tokenizer('He is a great guy", return_tensors="pt")
prediction_logits, _ = model(input_ids=inputs['input_ids'],attention_mask=inputs['attention_mask'])

📚 Documentation

Model Details

Property	Details
Model Type	Text Classification
Language(s)	English
License	Apache-2.0
Parent Model	See the BERT base uncased model for more information about the BERT base model.
Resources for more information	Research Paper Accepted at AAAI 2021. GitHub Repo with datatsets and models

Uses

Direct Use

This model can be used for Text Classification.

Downstream Use

[More information needed]

Misuse and Out-of-scope Use

The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Risks, Limitations and Biases

⚠️ Important Note

Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example.

The model author's also note in their HateXplain paper that they

have not considered any external context such as profile bio, user gender, history of posts etc., which might be helpful in the classification task. Also, in this work we have focused on the English language. It does not consider multilingual hate speech into account.

Training

The authors detail their preprocessing procedure in the Github repository.

Evaluation

The mode authors detail the Hidden layer size and attention for the HateXplain fien tuned models in the associated paper.

Results

The model authors both in their paper and in the git repository provide the illustrative output of the BERT - HateXplain in comparison to BERT and and other HateXplain fine tuned models.

Citation Information

@article{mathew2020hatexplain,
  title={HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection},
  author={Mathew, Binny and Saha, Punyajoy and Yimam, Seid Muhie and Biemann, Chris and Goyal, Pawan and Mukherjee, Animesh},
  journal={arXiv preprint arXiv:2012.10289},
  year={2020}
}

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご