🚀 阿拉伯语Transformer小型模型 (B6 - 6 - 6 带解码器)
本项目的阿拉伯语Transformer小型模型,借助漏斗Transformer和ELECTRA目标函数构建,在阿拉伯语下游任务中展现出高效且出色的性能,同时在训练资源使用上更为经济。
🚀 快速开始
你可以通过以下链接快速体验模型的使用示例:
✨ 主要特性
- 高效预训练:基于漏斗Transformer和ELECTRA目标函数,在44GB阿拉伯语语料库上进行预训练,显著降低了预训练成本。
- 性能出色:尽管使用的计算资源少于其他基于BERT的模型,但在多个阿拉伯语下游任务中取得了最先进的结果。
- 参数与时间平衡:相较于ELECTRA-base架构,该模型具有更多参数(1.39倍),推理和微调时间相近或略长。
📚 详细文档
论文
摘要
在阿拉伯语语料库集合上对基于Transformer的模型(如BERT和ELECTRA)进行预训练,正如AraBERT和AraELECTRA所展示的那样,在下游任务中显示出令人印象深刻的结果。然而,预训练基于Transformer的语言模型在计算上是昂贵的,特别是对于大规模模型。最近,漏斗Transformer通过压缩隐藏状态序列解决了Transformer架构内的顺序冗余问题,从而显著降低了预训练成本。本文实证研究了使用漏斗Transformer和ELECTRA目标构建阿拉伯语语言模型的性能和效率。我们发现,与其他基于BERT的模型相比,尽管使用的计算资源较少,但我们的模型在多个阿拉伯语下游任务中取得了最先进的结果。
模型描述
该模型使用 带有ELECTRA目标的漏斗Transformer 在44GB的阿拉伯语语料库上进行预训练。与ELECTRA-base架构相比,该模型具有更多参数(1.39倍),推理和微调时间相近或略长。该模型在预训练时使用的资源明显少于最先进的模型。
阿拉伯语TyDi QA任务结果
模型 |
EM |
F1 |
AraBERT02-Large |
73.72 |
86.03 |
AraELECTRA-Base |
74.91 |
86.68 |
ArabicTransformer-Small |
74.70 |
85.89 |
ArabicTransformer-Base |
75.57 |
87.22 |
致谢
我们感谢TPU研究云(TRC)团队为我们提供TPUv3单元的访问权限。
BibTeX引用
@inproceedings{alrowili-shanker-2021-arabictransformer-efficient,
title = "{A}rabic{T}ransformer: Efficient Large {A}rabic Language Model with Funnel Transformer and {ELECTRA} Objective",
author = "Alrowili, Sultan and
Shanker, Vijay",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.108",
pages = "1255--1261",
abstract = "Pre-training Transformer-based models such as BERT and ELECTRA on a collection of Arabic corpora, demonstrated by both AraBERT and AraELECTRA, shows an impressive result on downstream tasks. However, pre-training Transformer-based language models is computationally expensive, especially for large-scale models. Recently, Funnel Transformer has addressed the sequential redundancy inside Transformer architecture by compressing the sequence of hidden states, leading to a significant reduction in the pre-training cost. This paper empirically studies the performance and efficiency of building an Arabic language model with Funnel Transformer and ELECTRA objective. We find that our model achieves state-of-the-art results on several Arabic downstream tasks despite using less computational resources compared to other BERT-based models.",
}
🔗 项目链接