xlmr_formality_classifier Open-source Model - Supports Multi-language Text Formality Classification, Including English, French, Italian, and Portuguese

Xlmr Formality Classifier

Developed by s-nlp

A multilingual text formality classification model based on XLM-Roberta, supporting English, French, Italian, and Portuguese

Text Classification

Transformers

Supports Multiple Languages#Multilingual Formality Classification #Text Style Detection #XLM-Roberta Model

Downloads 795

Release Time : 3/2/2022

Model Overview

This model is used to detect the formality level of text, classifying input text into 'formal' or 'informal' categories. Trained on the multilingual formality classification dataset XFORMAL.

Model Features

Multilingual Support

Supports formality classification in four languages: English, French, Italian, and Portuguese

High Accuracy

Achieves 85.2% accuracy in English classification tasks, with other languages ranging between 76-80% accuracy

Transformer-based Architecture

Uses XLM-Roberta-base as the base model, providing robust text understanding capabilities

Model Capabilities

Text Formality Classification

Multilingual Text Analysis

Use Cases

Text Processing

Formal Document Filtering

Automatically identifies and classifies formal and informal documents

Can be used for automatic classification in document management systems

Writing Assistance Tool

Helps users check the formality level of text and provides writing suggestions

Improves writing quality and ensures compliance with the formality requirements of the target context

Content Moderation

Content Appropriateness Check

Identifies informal content unsuitable for formal contexts

Can be used for automatic moderation in forums and comment sections

🚀 XLM-Roberta Formality Classifier

This is a multilingual text formality classifier. It's based on XLM-Roberta and trained on a multilingual formality classification dataset, aiming to accurately classify text as formal or informal across multiple languages.

✨ Features

Multilingual Support: Supports multiple languages including English, French, Italian, and Portuguese.
High - Performance Classification: Trained on a large - scale multilingual dataset for accurate formality classification.

📦 Installation

No specific installation steps are provided in the original document. If you want to use this model, you need to have the transformers library installed. You can install it via pip install transformers.

💻 Usage Examples

Basic Usage

from transformers import XLMRobertaTokenizerFast, XLMRobertaForSequenceClassification

# load tokenizer and model weights
tokenizer = XLMRobertaTokenizerFast.from_pretrained('s-nlp/xlmr_formality_classifier')
model = XLMRobertaForSequenceClassification.from_pretrained('s-nlp/xlmr_formality_classifier')

id2formality = {0: "formal", 1: "informal"}
texts = [
    "I like you. I love you",
    "Hey, what's up?",
    "Siema, co porabiasz?",
    "I feel deep regret and sadness about the situation in international politics.",
]

# prepare the input
encoding = tokenizer(
    texts,
    add_special_tokens=True,
    return_token_type_ids=True,
    truncation=True,
    padding="max_length",
    return_tensors="pt",
)

# inference
output = model(**encoding)

formality_scores = [
    {id2formality[idx]: score for idx, score in enumerate(text_scores.tolist())}
    for text_scores in output.logits.softmax(dim=1)
]
print(formality_scores)

Output Example

[{'formal': 0.993225634098053, 'informal': 0.006774314679205418},
 {'formal': 0.8807966113090515, 'informal': 0.1192033663392067},
 {'formal': 0.936184287071228, 'informal': 0.06381577253341675},
 {'formal': 0.9986615180969238, 'informal': 0.0013385231141000986}]

📚 Documentation

Model Overview

This is the model presented in the paper "Detecting Text Formality: A Study of Text Classification Approaches". It is an XLM - Roberta - based classifier trained on XFORMAL, a multilingual formality classification dataset.

Results

All Languages

Property	Details
Model Type	XLM - Roberta - based classifier
Training Data	XFORMAL

	precision	recall	f1 - score	support
0	0.744912	0.927790	0.826354	108019
1	0.889088	0.645630	0.748048	96845
accuracy			0.794405	204864
macro avg	0.817000	0.786710	0.787201	204864
weighted avg	0.813068	0.794405	0.789337	204864

English (EN)

	precision	recall	f1 - score	support
0	0.800053	0.962981	0.873988	22151
1	0.945106	0.725899	0.821124	19449
accuracy			0.852139	41600
macro avg	0.872579	0.844440	0.847556	41600
weighted avg	0.867869	0.852139	0.849273	41600

French (FR)

	precision	recall	f1 - score	support
0	0.746709	0.925738	0.826641	21505
1	0.887305	0.650592	0.750731	19327
accuracy			0.795504	40832
macro avg	0.817007	0.788165	0.788686	40832
weighted avg	0.813257	0.795504	0.790711	40832

Italian (IT)

	precision	recall	f1 - score	support
0	0.721282	0.914669	0.806545	21528
1	0.864887	0.607135	0.713445	19368
accuracy			0.769024	40896
macro avg	0.793084	0.760902	0.759995	40896
weighted avg	0.789292	0.769024	0.762454	40896

Portuguese (PT)

	precision	recall	f1 - score	support
0	0.717546	0.908167	0.801681	21637
1	0.853628	0.599700	0.704481	19323
accuracy			0.762646	40960
macro avg	0.785587	0.753933	0.753081	40960
weighted avg	0.781743	0.762646	0.755826	40960

📄 License

This model is licensed under the OpenRAIL++ License, which supports the development of various technologies—both industrial and academic—that serve the public good.

📖 Citation

@inproceedings{dementieva-etal-2023-detecting,
    title = "Detecting Text Formality: A Study of Text Classification Approaches",
    author = "Dementieva, Daryna  and
      Babakov, Nikolay  and
      Panchenko, Alexander",
    editor = "Mitkov, Ruslan  and
      Angelova, Galia",
    booktitle = "Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing",
    month = sep,
    year = "2023",
    address = "Varna, Bulgaria",
    publisher = "INCOMA Ltd., Shoumen, Bulgaria",
    url = "https://aclanthology.org/2023.ranlp-1.31",
    pages = "274--284",
    abstract = "Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation{---}GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models. At the same time, the detection of text formality on its own may also be a useful application. This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments {--} monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task, while Transformer-based classifiers are more stable to cross-lingual knowledge transfer.",
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご