xlm-roberta-large-tydip Open-source Model - Multilingual Politeness Judgment Supporting 10 Languages

Xlm Roberta Large Tydip

Developed by Genius1237

A multilingual politeness classification model based on the xlm-roberta-large architecture, fine-tuned on the English subset of the TyDiP dataset, supporting politeness judgment in 10 languages

Text Classification

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Multilingual Politeness Analysis #Cross-lingual Text Classification #High-accuracy XLMR

Downloads 929

Release Time : 4/20/2023

Model Overview

This model is used to determine the politeness level (polite/impolite) of text, specifically designed for multilingual scenarios, and performs excellently in English and 9 other languages

Model Features

Multilingual Support

Supports politeness classification in 10 languages, including non-Latin scripts like Hindi and Korean

High Accuracy

Achieves 0.892 accuracy on the English test set, with good performance in other languages

Cross-lingual Capability

Based on the XLMR architecture, it has excellent cross-lingual transfer capabilities and may be applicable to more languages

Model Capabilities

Multilingual Text Classification

Politeness Judgment

Cross-lingual Transfer Learning

Use Cases

Social Media Analysis

Comment Politeness Filtering

Automatically identifies the politeness level of social media comments

Helps filter out impolite content

Customer Service Systems

Customer Service Response Quality Monitoring

Evaluates the politeness level of customer service responses

Improves customer service quality

🚀 Multilingual Politeness Classification Model

This model is designed for text classification, leveraging the power of xlm - roberta - large to classify politeness across multiple languages.

🚀 Quick Start

This model is based on xlm - roberta - large and is finetuned on the English subset of the TyDiP dataset, as discussed in the original paper here.

✨ Features

Multilingual Support: In the paper, this model was evaluated on English + 9 Languages (Hindi, Korean, Spanish, Tamil, French, Vietnamese, Russian, Afrikaans, Hungarian). Given the model's good performance and XLMR's cross - lingual abilities, it is likely that this finetuned model can be used for more languages as well.
High - Accuracy Classification: The model demonstrates high accuracy in politeness classification across multiple languages.

📦 Installation

No specific installation steps are provided in the original README. If you want to use the model, you need to have the transformers library installed. You can install it using pip install transformers.

💻 Usage Examples

Basic Usage

You can use this model directly with a text - classification pipeline.

from transformers import pipeline

classifier = pipeline(task="text-classification", model="Genius1237/xlm-roberta-large-tydip")

sentences = ["Could you please get me a glass of water", "mere liye पानी का एक गिलास ले आओ "]

print(classifier(sentences))
# [{'label': 'polite', 'score': 0.9076159000396729}, {'label': 'impolite', 'score': 0.765066385269165}]

Advanced Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained('Genius1237/xlm-roberta-large-tydip')
model = AutoModelForSequenceClassification.from_pretrained('Genius1237/xlm-roberta-large-tydip')

text = "Could you please get me a glass of water"
encoded_input = tokenizer(text, return_tensors='pt')

output = model(**encoded_input)
prediction = torch.argmax(output.logits).item()

print(model.config.id2label[prediction])
# polite

📚 Documentation

Evaluation

The politeness classification accuracy scores on 10 languages from the TyDiP test set are as follows:

lang	acc
en	0.892
hi	0.868
ko	0.784
es	0.84
ta	0.78
fr	0.82
vi	0.844
ru	0.668
af	0.856
hu	0.812

📄 License

This project is licensed under the MIT license.

📚 Citation

@inproceedings{srinivasan-choi-2022-tydip,
    title = "{T}y{D}i{P}: A Dataset for Politeness Classification in Nine Typologically Diverse Languages",
    author = "Srinivasan, Anirudh  and
      Choi, Eunsol",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.420",
    doi = "10.18653/v1/2022.findings-emnlp.420",
    pages = "5723--5738",
    abstract = "We study politeness phenomena in nine typologically diverse languages. Politeness is an important facet of communication and is sometimes argued to be cultural-specific, yet existing computational linguistic study is limited to English. We create TyDiP, a dataset containing three-way politeness annotations for 500 examples in each language, totaling 4.5K examples. We evaluate how well multilingual models can identify politeness levels {--} they show a fairly robust zero-shot transfer ability, yet fall short of estimated human accuracy significantly. We further study mapping the English politeness strategy lexicon into nine languages via automatic translation and lexicon induction, analyzing whether each strategy{'}s impact stays consistent across languages. Lastly, we empirically study the complicated relationship between formality and politeness through transfer experiments. We hope our dataset will support various research questions and applications, from evaluating multilingual models to constructing polite multilingual agents.",
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご