MetaHateBERT Open-source Text Classification Model - Free Deployment for Precise Detection of Hate Speech

Metahatebert

Developed by irlab-udc

MetaHateBERT is a text classification model based on the BERT architecture, specifically designed for detecting hate speech.

Text Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Hate speech detection #Social media content moderation #BERT fine-tuning

Downloads 1,456

Release Time : 6/17/2024

Model Overview

This model is based on the bert-base-uncased architecture and has been fine-tuned on a custom dataset to achieve binary text classification with labels of 'No hate' and 'Hate'.

Model Features

Hate speech detection

Specifically designed for detecting hate speech in social media comments, forums, and other text data sources.

Content moderation

Platforms can use this model to automatically flag potentially harmful content.

Model Capabilities

Text classification

Hate speech detection

Use Cases

Content moderation

Social media comment moderation

Automatically detect hate speech in social media comments

Flag potentially harmful content

Forum content filtering

Identify hate speech in forum posts

Help maintain a healthy discussion environment

🚀 MetaHateBERT

This is a fine - tuned BERT model designed to detect hate speech in text. It offers solutions for hate speech detection and content moderation, enhancing the safety and quality of online text data.

🚀 Quick Start

To use this model, you can load it via the transformers library:

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="irlab-udc/MetaHateBERT")

# Test the model
result = classifier("Your input text here")
print(result)  # Should print the labels "no hate" or "hate"

✨ Features

Model Description

This is a fine - tuned BERT model specifically designed to detect hate speech in text. The model is based on the bert - base - uncased architecture and has been fine - tuned on a custom dataset for the task of binary text classification, where the labels are no hate and hate.

Intended Uses

Hate Speech Detection: This model is intended for detecting hate speech in social media comments, forums, and other text data sources.
Content Moderation: Can be used by platforms to automatically flag potentially harmful content.

Limitations

Biases: The model may carry biases present in the training data.
False Positives/Negatives: It's not perfect and may misclassify some instances.
Domain Specificity: Performance may vary across different domains.

📚 Documentation

Citation

If you use this model, please cite the following reference:

@article{Piot_Martín-Rodilla_Parapar_2024,
  title={MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection},
  volume={18},
  url={https://ojs.aaai.org/index.php/ICWSM/article/view/31445},
  DOI={10.1609/icwsm.v18i1.31445},
  abstractNote={Hate speech represents a pervasive and detrimental form of online discourse, often manifested through an array of slurs, from hateful tweets to defamatory posts. As such speech proliferates, it connects people globally and poses significant social, psychological, and occasionally physical threats to targeted individuals and communities. Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training. For unifying efforts, our study advances in the critical need for a comprehensive meta - collection, advocating for an extensive dataset to help counteract this problem effectively. We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate. This paper offers a detailed examination of existing collections, highlighting their strengths and limitations. Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models. These enhanced models are essential for effectively combating the dynamic and complex nature of hate speech in the digital realm.},
  number={1},
  journal={Proceedings of the International AAAI Conference on Web and Social Media},
  author={Piot, Paloma and Martín - Rodilla, Patricia and Parapar, Javier},
  year={2024},
  month={May},
  pages={2025 - 2039}
}

Acknowledgements

The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Skłodowska - Curie Grant Agreement No. 101073351. The authors also thank the financial support supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditation 2019 - 2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruña as a Research Center of the Galician University System and the project PID2022 - 137061OB - C21 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Proyectos de Generación de Conocimiento; supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021 - 007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea - Next Generation EU).

📄 License

The model is licensed under the Apache 2.0 license.

Property	Details
Model Type	Fine - tuned BERT model for text classification
Training Data	irlab - udc/metahate
Metrics	accuracy, f1
Pipeline Tag	text - classification
Tags	hate speech

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご