roberta-base-formality-ranker Open-source Model - Freely Predict the Formality or Informality Level of English Sentences

Roberta Base Formality Ranker

Developed by s-nlp

This model is based on the RoBERTa architecture, specifically designed to predict the formality level of English sentences.

Text Classification

Transformers

English#Formality prediction #Text style classification #Multi-domain adaptation

Downloads 1,349

Release Time : 3/2/2022

Model Overview

The model is trained to accurately determine the formality level of English text, suitable for text style analysis and transformation tasks.

Model Features

High accuracy

Achieves 90.87% accuracy and 0.9779 ROC AUC score on the GYAFC test set.

Data augmentation processing

Avoids over-reliance on surface features through data augmentation methods like case conversion and punctuation removal.

Multi-dataset training

Trained on both GYAFC and Pavlick-Tetreault formality corpora.

Model Capabilities

Text formality classification

Style feature analysis

Use Cases

Text processing

Formality assessment

Automatically evaluates text formality for writing assistance tools.

Achieves F1 score of 0.90 on GYAFC dataset

Style conversion

Serves as a component in style transfer systems to maintain content consistency.

Education

Writing guidance

Helps students identify and adjust writing styles for different contexts.

🚀 English Sentence Formality Prediction Model

This model is trained to predict whether English sentences are formal or informal, offering a practical solution for text style analysis.

✨ Features

Trained to predict the formality of English sentences.
Utilizes roberta-base as the base model.
Trained on multiple datasets including GYAFC and an online formality corpus.
Applies data augmentation techniques to reduce over - reliance on punctuation and capitalization.

📦 Installation

No installation steps are provided in the original document.

💻 Usage Examples

No code examples are provided in the original document.

📚 Documentation

Model Details

Base model: roberta-base
Datasets:
- GYAFC from Rao and Tetreault, 2018.
- online formality corpus from Pavlick and Tetreault, 2016.
Data augmentation: Changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. This is to prevent the model from over - relying on punctuation and capitalization.
Loss: Binary classification (on GYAFC), in - batch ranking (on PT data).

Performance Metrics on Test Data

Property	Details
Model Type	English Sentence Formality Prediction Model
Training Data	GYAFC, online formality corpus

Performance on GYAFC

dataset	ROC AUC	precision	recall	fscore	accuracy	Spearman
GYAFC	0.9779	0.90	0.91	0.90	0.9087	0.8233
GYAFC normalized (lowercase + remove punct.)	0.9234	0.85	0.81	0.82	0.8218	0.7294

Performance on P&T subset

P&T subset	Spearman R
news	0.4003
answers	0.7500
blog	0.7334
email	0.7606

🔧 Technical Details

The model uses a binary classification loss on the GYAFC dataset and in - batch ranking loss on the PT data. Data augmentation techniques are applied to make the model less dependent on punctuation and capitalization, enabling it to focus more on other text features.

📄 License

This model is released under the Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License.

📚 Citation

If you are using the model in your research, please cite the following paper where it was introduced:

@InProceedings{10.1007/978-3-031-35320-8_4,
  author="Babakov, Nikolay
  and Dale, David
  and Gusev, Ilya
  and Krotova, Irina
  and Panchenko, Alexander",
  editor="M{\'e}tais, Elisabeth
  and Meziane, Farid
  and Sugumaran, Vijayan
  and Manning, Warren
  and Reiff-Marganiec, Stephan",
  title="Don't Lose the Message While Paraphrasing: A Study on Content Preserving Style Transfer",
  booktitle="Natural Language Processing and Information Systems",
  year="2023",
  publisher="Springer Nature Switzerland",
  address="Cham",
  pages="47--61",
  isbn="978-3-031-35320-8"
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご