🚀 MilaNLP EDOS任务模型
本模型作为MilaNLP针对EDOS共享任务解决方案的一部分进行训练和发布。它为在线性别歧视的可解释检测提供了有效的工具,有助于推动相关领域的研究和应用。
🚀 快速开始
本模型已作为MilaNLP对EDOS共享任务解决方案的一部分进行了训练和发布。如需了解更多详细信息,请查阅论文 MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection。
📚 详细文档
适配详情
我们使用标准的掩码语言模型(MLM)对 预训练的DeBERTa 进行了领域适配,训练数据来自任务组织者提供的无标签Reddit语料库(100万条帖子)(Kirk等人,2023)和Gab仇恨语料库(8.7万条帖子)(Kennedy等人,2022)。将这两个数据集连接并打乱后,我们留出5%作为验证数据,并根据数据源进行分层。最终的训练数据集约有2000万个单词。
完整详情请参考上述论文。
📄 许可证
本项目采用Apache-2.0许可证。
📚 引用
如果您使用了该模型,请考虑引用以下文献:
@inproceedings{cercas-curry-etal-2023-milanlp,
title = "{M}ila{NLP} at {S}em{E}val-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection",
author = "Cercas Curry, Amanda and
Attanasio, Giuseppe and
Nozza, Debora and
Hovy, Dirk",
booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.semeval-1.285",
doi = "10.18653/v1/2023.semeval-1.285",
pages = "2067--2074",
abstract = "We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task.We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard fine-tuning.Our results show that the ensemble is more robust than individual models and that regularized models generate more {``}conservative{''} predictions, mitigating the effects of lexical overfitting.However, our error analysis also finds that many of the misclassified instances are debatable, raising questions about the objective annotatability of hate speech data.",
}