Sloberta-Frenk-Hate Open-source Model - Free Deployment, Accurately Detect Hate Speech Against the LGBT Community and Immigrants in Slovene

Sloberta Frenk Hate

Developed by classla

A Slovenian hate speech classifier fine-tuned on the SloBERTa model, specifically designed for detecting offensive language targeting LGBT communities and immigrants

Text Classification

Transformers

Other#Slovenian hate speech detection #Binary text classification #FRENK dataset fine-tuning

Downloads 17

Release Time : 3/2/2022

Model Overview

This model is a text classification model fine-tuned on the Slovenian portion of the FRENK dataset based on EMBEDDIA/sloberta, used to identify hate speech and offensive language targeting specific groups.

Model Features

Optimized for specific groups

Specifically optimized for hate speech related to LGBT communities and immigrants

Binary classification

Relabeled the original dataset into simple binary classification (offensive/acceptable)

Superior performance

Demonstrates better performance than similar models in Slovenian language

Model Capabilities

Text classification

Hate speech detection

Offensive language identification

Use Cases

Content moderation

Social media content filtering

Automatically identifies and filters hate speech targeting LGBT communities and immigrants on social media

77.85% accuracy, 77.64% F1 score

Academic research

Linguistic behavior research

Used to study linguistic characteristics and patterns of hate speech in Slovenian

🚀 Sloberta Frenk Hate - Text Classification Model

This is a text classification model based on EMBEDDIA/sloberta and fine - tuned on the FRENK dataset, which consists of LGBT and migrant hatespeech. Only the Slovenian subset of the data was used for fine - tuning, and the dataset has been relabeled for binary classification (offensive or acceptable).

🚀 Quick Start

This text classification model is built upon EMBEDDIA/sloberta and fine - tuned on the FRENK dataset. It focuses on classifying text related to LGBT and migrant hatespeech in Slovenian.

✨ Features

Based on EMBEDDIA/sloberta architecture.
Fine - tuned on a dataset with relabeled binary classification (offensive or acceptable).
Compares well with other transformer models and fasttext in terms of accuracy and macro F1 score.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from simpletransformers.classification import ClassificationModel
model_args = {
        "num_train_epochs": 6,
        "learning_rate": 3e-6,
        "train_batch_size": 69}

model = ClassificationModel(
    "camembert", "5roop/sloberta-frenk-hate", use_cuda=True,
    args=model_args
    
)

predictions, logit_output = model.predict(["Silva, ti si grda in neprijazna", "Naša hiša ima dimnik"])
predictions
### Output:
### array([1, 0])

📚 Documentation

Fine - tuning hyperparameters

Fine - tuning was performed with simpletransformers. Beforehand, a brief hyperparameter optimisation was performed, and the presumed optimal hyperparameters are:

model_args = {
        "num_train_epochs": 14,
        "learning_rate": 1e-5,
        "train_batch_size": 21,
        }

Performance

The same pipeline was run with two other transformer models and fasttext for comparison. Accuracy and macro F1 score were recorded for each of the 6 fine - tuning sessions and post - festum analyzed.

model	average accuracy	average macro F1
sloberta - frenk - hate	0.7785	0.7764
EMBEDDIA/crosloengual - bert	0.7616	0.7585
xlm - roberta - base	0.686	0.6827
fasttext	0.709	0.701

From recorded accuracies and macro F1 scores, p - values were also calculated:

Comparison with `crosloengual-bert`

test	accuracy p - value	macro F1 p - value
Wilcoxon	0.00781	0.00781
Mann Whithney U test	0.00163	0.00108
Student t - test	0.000101	3.95e - 05

Comparison with `xlm-roberta-base`

test	accuracy p - value	macro F1 p - value
Wilcoxon	0.00781	0.00781
Mann Whithney U test	0.00108	0.00108
Student t - test	9.46e - 11	6.94e - 11

🔧 Technical Details

The model is based on EMBEDDIA/sloberta and fine - tuned using simpletransformers. Hyperparameter optimisation was carried out to find the optimal settings for fine - tuning.

📄 License

This project is licensed under the CC BY - SA 4.0 license.

📚 Citation

If you use the model, please cite the following paper on which the original model is based:

@article{DBLP:journals/corr/abs-1907-11692,
  author    = {Yinhan Liu and
               Myle Ott and
               Naman Goyal and
               Jingfei Du and
               Mandar Joshi and
               Danqi Chen and
               Omer Levy and
               Mike Lewis and
               Luke Zettlemoyer and
               Veselin Stoyanov},
  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
  journal   = {CoRR},
  volume    = {abs/1907.11692},
  year      = {2019},
  url       = {http://arxiv.org/abs/1907.11692},
  archivePrefix = {arXiv},
  eprint    = {1907.11692},
  timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

and the dataset used for fine - tuning:

@misc{ljubešić2019frenk,
      title={The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English}, 
      author={Nikola Ljubešić and Darja Fišer and Tomaž Erjavec},
      year={2019},
      eprint={1906.02045},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/1906.02045}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご