🚀 阿拉伯語Transformer小型模型 (B6 - 6 - 6 帶解碼器)
本項目的阿拉伯語Transformer小型模型,藉助漏斗Transformer和ELECTRA目標函數構建,在阿拉伯語下游任務中展現出高效且出色的性能,同時在訓練資源使用上更為經濟。
🚀 快速開始
你可以通過以下鏈接快速體驗模型的使用示例:
✨ 主要特性
- 高效預訓練:基於漏斗Transformer和ELECTRA目標函數,在44GB阿拉伯語語料庫上進行預訓練,顯著降低了預訓練成本。
- 性能出色:儘管使用的計算資源少於其他基於BERT的模型,但在多個阿拉伯語下游任務中取得了最先進的結果。
- 參數與時間平衡:相較於ELECTRA-base架構,該模型具有更多參數(1.39倍),推理和微調時間相近或略長。
📚 詳細文檔
論文
摘要
在阿拉伯語語料庫集合上對基於Transformer的模型(如BERT和ELECTRA)進行預訓練,正如AraBERT和AraELECTRA所展示的那樣,在下游任務中顯示出令人印象深刻的結果。然而,預訓練基於Transformer的語言模型在計算上是昂貴的,特別是對於大規模模型。最近,漏斗Transformer通過壓縮隱藏狀態序列解決了Transformer架構內的順序冗餘問題,從而顯著降低了預訓練成本。本文實證研究了使用漏斗Transformer和ELECTRA目標構建阿拉伯語語言模型的性能和效率。我們發現,與其他基於BERT的模型相比,儘管使用的計算資源較少,但我們的模型在多個阿拉伯語下游任務中取得了最先進的結果。
模型描述
該模型使用 帶有ELECTRA目標的漏斗Transformer 在44GB的阿拉伯語語料庫上進行預訓練。與ELECTRA-base架構相比,該模型具有更多參數(1.39倍),推理和微調時間相近或略長。該模型在預訓練時使用的資源明顯少於最先進的模型。
阿拉伯語TyDi QA任務結果
模型 |
EM |
F1 |
AraBERT02-Large |
73.72 |
86.03 |
AraELECTRA-Base |
74.91 |
86.68 |
ArabicTransformer-Small |
74.70 |
85.89 |
ArabicTransformer-Base |
75.57 |
87.22 |
致謝
我們感謝TPU研究雲(TRC)團隊為我們提供TPUv3單元的訪問權限。
BibTeX引用
@inproceedings{alrowili-shanker-2021-arabictransformer-efficient,
title = "{A}rabic{T}ransformer: Efficient Large {A}rabic Language Model with Funnel Transformer and {ELECTRA} Objective",
author = "Alrowili, Sultan and
Shanker, Vijay",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.findings-emnlp.108",
pages = "1255--1261",
abstract = "Pre-training Transformer-based models such as BERT and ELECTRA on a collection of Arabic corpora, demonstrated by both AraBERT and AraELECTRA, shows an impressive result on downstream tasks. However, pre-training Transformer-based language models is computationally expensive, especially for large-scale models. Recently, Funnel Transformer has addressed the sequential redundancy inside Transformer architecture by compressing the sequence of hidden states, leading to a significant reduction in the pre-training cost. This paper empirically studies the performance and efficiency of building an Arabic language model with Funnel Transformer and ELECTRA objective. We find that our model achieves state-of-the-art results on several Arabic downstream tasks despite using less computational resources compared to other BERT-based models.",
}
🔗 項目鏈接