Russian - Inappropriate - Messages Open - source Model - Free Detection of Potential Reputation

Russian Inappropriate Messages

Developed by apanc

Designed to detect inappropriate content in Russian that lacks profanity but may harm the speaker's reputation

Text Classification Other#Russian content moderation #Non-toxic inappropriateness detection #Sensitive topic identification

Downloads 4,039

Release Time : 3/2/2022

Model Overview

This model serves as an additional layer after toxicity filtering, specifically detecting subtly inappropriate messages in Russian. Based on sensitive topic classification, it can identify potentially harmful expressions such as justifying violence or offending religious sentiments.

Model Features

Fine-grained inappropriateness detection

Focuses on non-toxic but reputation-damaging expressions, such as justifying criminal behavior or offending religious sentiments

Sensitive topic correlation

Detection of inappropriate content strongly associated with specific sensitive topics (e.g., religion, crime)

Multi-stage filtering

Designed as a complementary filtering layer after toxicity detection, forming a multi-stage content moderation process

Model Capabilities

Russian text classification

Inappropriate content identification

Sensitive topic correlation analysis

Use Cases

Content moderation

Social media filtering

Additional inappropriate content detection after basic toxicity filtering

Can reduce missed inappropriate content by 89% (test set accuracy)

Corporate reputation protection

Detecting potentially damaging employee/user statements

Identifies non-explicit but potentially risky expressions

Academic research

Linguistic behavior analysis

Studying linguistic features of inappropriate expressions in Russian

🚀 Russian Inappropriate Messages Classification Model

This model is designed to detect inappropriate messages in the Russian language. It focuses on a derivative of toxicity, providing an additional layer of filtering after toxicity and obscenity checks.

🚀 Quick Start

The 'inappropriateness' concept in this model is not a substitute for toxicity but a derivative of it. The model can serve as an additional layer of inappropriateness filtering after toxicity and obscenity filtration. You can detect the exact sensitive topic by using another model. The proposed pipeline is shown in the scheme below.

alternativetext

You can also train one classifier for both toxicity and inappropriateness detection. The data to be mixed with toxic labelled samples could be found on our GitHub or on kaggle

✨ Features

Inappropraiteness Intuition

This model is trained on a dataset of inappropriate Russian messages. An inappropriate utterance generally has no obscene words or toxic intent but can harm the speaker's reputation. Find some sample for more intuition in the table below. Learn more about the concept of inappropriateness in this article presented at the workshop for Balto - Slavic NLP at the EACL - 2021 conference. Note that this article describes the first version of the dataset, while the model is trained on the extended version available on our GitHub or on kaggle. The dataset properties are the same as described in the article, only the size differs.

Text	Inappropriate	Comment	Unsafe topic
Ладно бы видного деятеля завалили а тут какого то ноунейм нигру преступника	yes	Оправдание убийства	offline crime
Это нарушение УКРФ!	no	Осуждение нарушения закона	offline crime
Религия - это насилие высшего уровня. Верующего принуждают к вере еще в младенчестве.	yes	Оскорбление чувств верующих	religion
Христиане, как вы относитесь к иноверцам, в частности к Гамалиилу и к уверовавшим в Иисуса фарисеям?	no	Обычные вопросы к верующим без намека на оскорбление их чувств	religion
Хорошо что проституция запрещена	no	Осуждение простуитуции	prostitution
Возьму бабки и сниму шлюх на выходных	yes	Популяризация проституции	prostitution

Metrics

The model was trained, validated, and tested only on samples with 100% confidence, achieving the following metrics on the test set:

	precision	recall	f1 - score	support
0	0.92	0.93	0.93	7839
1	0.80	0.76	0.78	2726
accuracy			0.89	10565
macro avg	0.86	0.85	0.85	10565
weighted avg	0.89	0.89	0.89	10565

📄 License

This project is licensed under the [Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License][cc - by - nc - sa].

[![CC BY - NC - SA 4.0][cc - by - nc - sa - image]][cc - by - nc - sa]

[cc - by - nc - sa]: http://creativecommons.org/licenses/by - nc - sa/4.0/ [cc - by - nc - sa - image]: https://i.creativecommons.org/l/by - nc - sa/4.0/88x31.png

📚 Documentation

Citation

If you find this repository helpful, feel free to cite our publication:

@inproceedings{babakov-etal-2021-detecting,
    title = "Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company{'}s Reputation",
    author = "Babakov, Nikolay  and
      Logacheva, Varvara  and
      Kozlova, Olga  and
      Semenov, Nikita  and
      Panchenko, Alexander",
    booktitle = "Proceedings of the 8th Workshop on Balto - Slavic Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Kiyv, Ukraine",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.bsnlp-1.4",
    pages = "26--36",
    abstract = "Not all topics are equally {``}flammable{''} in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness. While toxicity in user - generated data is well - studied, we aim at defining a more fine - grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic - related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic - labelled dataset and an appropriateness - labelled dataset. We also release pre - trained classification models trained on this data.",
}

📞 Contacts

If you have any questions please contact Nikolay

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご