đ Inclusively Rewriting model
This is an Italian sequence-to-sequence model fine-tuned from the IT5-large for inclusive language rewriting. It analyzes and rewrites Italian sentences to make them more inclusive when necessary. For instance, it can rewrite I professori devono essere preparati
(The professors must be prepared) as Il personale docente deve essere preparato
(The teaching staff must be prepared).
đĻ Installation
No installation steps provided in the original document, so this section is skipped.
đģ Usage Examples
No code examples provided in the original document, so this section is skipped.
đ Documentation
đ Training data
The model was trained on a dataset with 4705 sentence pairs, each having an inclusive and a non - inclusive sentence. The dataset split is as follows:
- Training set: 3764 pairs
- Validation set: 470 pairs
- Test set: 471 pairs
A small set of synthetic data (generated by rules) was used to enhance the model's test - set performance. The total number of pairs for training is 3764 + 75 = 3839 pairs. The data collection was manually annotated by inclusive language experts, and the dataset is not publicly available yet.
âī¸ Training procedure
The model was fine - tuned from the Italian BERT model with these hyperparameters:
max_length
: 128
batch_size
: 8
learning_rate
: 5e - 5
warmup_steps
: 500
epochs
: 25 (the best model is selected based on validation BLEU
score)
optimizer
: AdamW
đ Evaluation results
The model was evaluated on the test set, and here are the results:
Model |
BLEU |
ROUGE - 2 F1 |
Human Correct |
Human Partial (L) |
Human Incorrect (L) |
IT5 (no synth. data) |
80.32 |
87.17 |
64.76 |
15.71 |
19.52 |
This |
80.79 |
87.47 |
69.52 |
17.14 |
13.22 |
(L) in the metric means "Lower is better". Comparing with the model without synthetic data shows that synthetic data improves the model's test - set performance. Other comparisons can be found in the paper.
đ Citation
If you use this model, please cite the following papers:
Main paper:
@article{10.1145/3729237,
author = {Greco, Salvatore and La Quatra, Moreno and Cagliero, Luca and Cerquitelli, Tania},
title = {Towards AI-Assisted Inclusive Language Writing in Italian Formal Communications},
year = {2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3729237},
doi = {10.1145/3729237},
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = apr,
}
Demo paper:
@InProceedings{PKDD23_inclusively,
author="La Quatra, Moreno
and Greco, Salvatore
and Cagliero, Luca
and Cerquitelli, Tania",
title="Inclusively: An AI-Based Assistant for Inclusive Writing",
booktitle="Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="361--365",
isbn="978-3-031-43430-3",
doi="10.1007/978-3-031-43430-3_31"
}
đ License
The model is released under the cc-by-nc-sa-4.0
license.