🚀 MilaNLP EDOS任務模型
本模型作為MilaNLP針對EDOS共享任務解決方案的一部分進行訓練和發佈。它為在線性別歧視的可解釋檢測提供了有效的工具,有助於推動相關領域的研究和應用。
🚀 快速開始
本模型已作為MilaNLP對EDOS共享任務解決方案的一部分進行了訓練和發佈。如需瞭解更多詳細信息,請查閱論文 MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection。
📚 詳細文檔
適配詳情
我們使用標準的掩碼語言模型(MLM)對 預訓練的DeBERTa 進行了領域適配,訓練數據來自任務組織者提供的無標籤Reddit語料庫(100萬條帖子)(Kirk等人,2023)和Gab仇恨語料庫(8.7萬條帖子)(Kennedy等人,2022)。將這兩個數據集連接並打亂後,我們留出5%作為驗證數據,並根據數據源進行分層。最終的訓練數據集約有2000萬個單詞。
完整詳情請參考上述論文。
📄 許可證
本項目採用Apache-2.0許可證。
📚 引用
如果您使用了該模型,請考慮引用以下文獻:
@inproceedings{cercas-curry-etal-2023-milanlp,
title = "{M}ila{NLP} at {S}em{E}val-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection",
author = "Cercas Curry, Amanda and
Attanasio, Giuseppe and
Nozza, Debora and
Hovy, Dirk",
booktitle = "Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.semeval-1.285",
doi = "10.18653/v1/2023.semeval-1.285",
pages = "2067--2074",
abstract = "We present the system proposed by the MilaNLP team for the Explainable Detection of Online Sexism (EDOS) shared task.We propose an ensemble modeling approach to combine different classifiers trained with domain adaptation objectives and standard fine-tuning.Our results show that the ensemble is more robust than individual models and that regularized models generate more {``}conservative{''} predictions, mitigating the effects of lexical overfitting.However, our error analysis also finds that many of the misclassified instances are debatable, raising questions about the objective annotatability of hate speech data.",
}