🚀 Chinese Pretrained Longformer Model | Longformer_ZH with PyTorch
This project offers a pre - trained Chinese Longformer model. Compared to the O(n^2) complexity of the Transformer model, Longformer provides an efficient approach to process long - document level sequences with linear complexity. Its attention mechanism combines standard self - attention and global attention, facilitating the model to better learn information from ultra - long sequences. There is a shortage of resources for Chinese Longformer or long - sequence Chinese tasks, so we open - source our pre - trained model parameters, along with the corresponding loading methods and pre - training scripts.
🚀 Quick Start
✨ Features
- Efficient Processing: Capable of handling long - document level sequences with linear complexity, in contrast to the O(n^2) complexity of the Transformer.
- Combined Attention Mechanism: Integrates local windowed attention and global attention for better learning of long - sequence information.
- Chinese Adaptation: Specifically pre - trained for Chinese language tasks, with the introduction of the
Whole - Word - Masking
mechanism for better language fitting.
📦 Installation
You can download our model from Google Drive or Baidu Yun:
- Google Drive: https://drive.google.com/file/d/1IDJ4aVTfSFUQLIqCYBtoRpnfbgHPoxB4/view?usp=sharing
- Baidu Yun: Link: https://pan.baidu.com/s/1HaVDENx52I7ryPFpnQmq1w, Extraction Code: y601
We also support automatic loading via HuggingFace.Transformers:
from Longformer_zh import LongformerZhForMaksedLM
LongformerZhForMaksedLM.from_pretrained('ValkyriaLenneth/longformer_zh')
⚠️ Important Note
- Please use
transformers.LongformerModel.from_pretrained
to load the model directly.
- The following notices are abandoned, please ignore them. Different from the original English Longformer, Longformer_Zh is based on Roberta_zh, which is a subclass of
Transformers.BertModel
rather than RobertaModel
. So, it cannot be loaded directly using the original code. We provide a modified Longformer_zh class for model loading. If you want to use our model on more downstream tasks, please refer to Longformer_zh.py
and replace the Attention layer with the Longformer Attention layer.
🔧 Technical Details
- Pretraining Corpus: The corpus for pre - training is from https://github.com/brightmart/nlp_chinese_corpus. Based on the Longformer paper, we use a mixture of 4 different Chinese corpora.
- Model Baseline: Our model is based on Roberta_zh_mid (https://github.com/brightmart/roberta_zh). The pre - training scripts are modified from https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb.
- Whole - Word - Masking: We introduce the
Whole - Word - Masking
method into pre - training for better fitting the Chinese language. Our WWM scripts are refactored from Roberta_zh_Tensorflow, and as far as we know, it is the first open - source Whole - word - masking script in Pytorch.
- Model Parameters: The model has a
max_seq_length = 4096
. Pre - training took about 4 days on 4 * Titan RTX. We used Nvidia.Apex
for mixed - precision training to accelerate the process. For data pre - processing, we used Jieba
for Chinese tokenization and JIONLP
for data cleaning.
📚 Documentation
Evaluation
We conducted evaluations on several tasks:
CCF Sentiment Analysis
Since it is difficult to obtain open - sourced long - sequence Chinese NLP tasks, we used the CCF - Sentiment - Analysis task for evaluation.
Model |
Dev F |
Bert |
80.3 |
Bert - wwm - ext |
80.5 |
Roberta - mid |
80.5 |
Roberta - large |
81.25 |
Longformer_SC |
79.37 |
Longformer_ZH |
80.51 |
Pretraining BPC
We also provide BPC (bits - per - character) scores of pre - training. The lower the BPC score, the better the performance of the language model. You can also treat it as PPL.
Model |
BPC |
Longformer before training |
14.78 |
Longformer after training |
3.10 |
CMRC (Chinese Machine Reading Comprehension)
Model |
F1 |
EM |
Bert |
85.87 |
64.90 |
Roberta |
86.45 |
66.57 |
Longformer_zh |
86.15 |
66.84 |
Chinese Coreference Resolution
Model |
Conll - F1 |
Precision |
Recall |
Bert |
66.82 |
70.30 |
63.67 |
Roberta |
67.77 |
69.28 |
66.32 |
Longformer_zh |
67.81 |
70.13 |
65.64 |
📄 License
Not provided in the original document.
Acknowledgments
Thanks to the Okumula·Funakoshi Lab from the Tokyo Institute of Technology for providing the computing resources and the opportunity to complete this project.