Hate-ita Open-Source Hate Speech Classification Model - Accurately Identify Insulting and Offensive Languages on Italian Social Media

Hate Ita

Developed by MilaNLProc

HATE-ITA is a binary hate speech classification model for Italian social media text, fine-tuned based on the XLM-T model, focusing on identifying insulting, hateful, and offensive language.

Text Classification

Transformers

OtherOpen Source License:Gpl-3.0 #Italian hate speech detection #Multilingual model fine-tuning #Social media text classification

Downloads 50

Release Time : 6/8/2022

Model Overview

This model is used to detect hate speech in Italian text, suitable for scenarios such as social media content moderation, effectively identifying insulting, hateful, and offensive language.

Model Features

Multilingual advantage

Trained on extensive English data and existing Italian datasets, outperforming monolingual models.

Language adaptability

Can effectively adapt to Italian-specific insult vocabulary and expressions.

Efficient detection

Achieves an F1 score of 0.83 on test sets, demonstrating high detection accuracy.

Model Capabilities

Italian text classification

Hate speech detection

Insulting language identification

Offensive content identification

Use Cases

Content moderation

Social media content filtering

Automatically detects and filters hate speech and insulting content on social media.

Enhances platform content safety and reduces the spread of harmful information.

Online community management

Assists administrators in identifying and handling offensive remarks within communities.

Maintains community harmony and reduces user conflicts.

🚀 HATE-ITA Base

HATE-ITA is a binary hate speech classification model designed for Italian social media text. It plays a crucial role in identifying and countering online hate speech in the Italian language.

🚀 Quick Start

HATE-ITA is a binary hate speech classification model for Italian social media text. You can quickly start using it with the provided code examples.

✨ Features

Multi - language Training: HATE-ITA is a set of multi - language models trained on a large set of English data and available Italian datasets, which performs better than mono - lingual models.
Good Adaptability: It seems to adapt well to language - specific slurs.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import pipeline
classifier = pipeline("text-classification",model='MilaNLProc/hate-ita',top_k=2)
prediction = classifier("ti odio")
print(prediction)

📚 Documentation

Abstract

Online hate speech is a dangerous phenomenon that can (and should) be promptly counteracted properly. While Natural Language Processing has been successfully used for the purpose, many of the research efforts are directed toward the English language. This choice severely limits the classification power in non - English languages. In this paper, we test several learning frameworks for identifying hate speech in Italian text. We release HATE-ITA, a set of multi - language models trained on a large set of English data and available Italian datasets. HATE-ITA performs better than mono - lingual models and seems to adapt well also on language - specific slurs. We believe our findings will encourage research in other mid - to - low resource communities and provide a valuable benchmarking tool for the Italian community.

Model

This model is the fine - tuned version of the XLM - T model.

Property	Details
Model Type	The fine - tuned version of the XLM - T model
Download
`hate-ita`	Link
`hate-ita-xlm-r-base`	Link
`hate-ita-xlm-r-large`	Link

Results

This model had an F1 of 0.83 on the test set.

Citation

Please use the following BibTeX entry if you use this model in your project:

@inproceedings{nozza-etal-2022-hate-ita,
    title = {{HATE-ITA}: Hate Speech Detection in Italian Social Media Text},
    author = "Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe",
    booktitle = "Proceedings of the 6th Workshop on Online Abuse and Harms",
    year = "2022",
    publisher = "Association for Computational Linguistics"
}

Ethical Statement

While promising, the results in this work should not be interpreted as a definitive assessment of the performance of hate speech detection in Italian. We are unsure if our model can maintain a stable and fair precision across the different targets and categories. HATE-ITA might overlook some sensible details, which practitioners should treat with care.

📄 License

GNU GPLv3

Authors

Debora Nozza • Federico Bianchi • Giuseppe Attanasio

Widget Examples

Hate Speech Classification 1: "Ci sono dei bellissimi capibara!"
Hate Speech Classification 2: "Sei una testa di cazzo!!"
Hate Speech Classification 3: "Ti odio!"

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご