đ English Sentence Formality Prediction Model
This model is trained to predict whether English sentences are formal or informal, offering a practical solution for text style analysis.
⨠Features
- Trained to predict the formality of English sentences.
- Utilizes
roberta-base
as the base model.
- Trained on multiple datasets including GYAFC and an online formality corpus.
- Applies data augmentation techniques to reduce over - reliance on punctuation and capitalization.
đĻ Installation
No installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
Model Details
- Base model:
roberta-base
- Datasets:
- Data augmentation: Changing texts to upper or lower case; removing all punctuation, adding dot at the end of a sentence. This is to prevent the model from over - relying on punctuation and capitalization.
- Loss: Binary classification (on GYAFC), in - batch ranking (on PT data).
Performance Metrics on Test Data
Property |
Details |
Model Type |
English Sentence Formality Prediction Model |
Training Data |
GYAFC, online formality corpus |
Performance on GYAFC
dataset |
ROC AUC |
precision |
recall |
fscore |
accuracy |
Spearman |
GYAFC |
0.9779 |
0.90 |
0.91 |
0.90 |
0.9087 |
0.8233 |
GYAFC normalized (lowercase + remove punct.) |
0.9234 |
0.85 |
0.81 |
0.82 |
0.8218 |
0.7294 |
Performance on P&T subset
P&T subset |
Spearman R |
news |
0.4003 |
answers |
0.7500 |
blog |
0.7334 |
email |
0.7606 |
đ§ Technical Details
The model uses a binary classification loss on the GYAFC dataset and in - batch ranking loss on the PT data. Data augmentation techniques are applied to make the model less dependent on punctuation and capitalization, enabling it to focus more on other text features.
đ License
This model is released under the Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License.

đ Citation
If you are using the model in your research, please cite the following paper where it was introduced:
@InProceedings{10.1007/978-3-031-35320-8_4,
author="Babakov, Nikolay
and Dale, David
and Gusev, Ilya
and Krotova, Irina
and Panchenko, Alexander",
editor="M{\'e}tais, Elisabeth
and Meziane, Farid
and Sugumaran, Vijayan
and Manning, Warren
and Reiff-Marganiec, Stephan",
title="Don't Lose the Message While Paraphrasing: A Study on Content Preserving Style Transfer",
booktitle="Natural Language Processing and Information Systems",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="47--61",
isbn="978-3-031-35320-8"
}