đ StructBERT: Un-Official Copy
This is an unofficial copy of StructBERT. It extends BERT by incorporating language structures into pre - training, offering enhanced language understanding capabilities.
đ Quick Start
Reproduce HFHub models
Download model/tokenizer vocab:
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_large_bert_config.json && mv ch_large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model && mv ch_model pytorch_model.bin
from transformers import BertConfig, BertModel, BertTokenizer
config = BertConfig.from_pretrained("./config.json")
model = BertModel.from_pretrained("./", config=config)
tokenizer = BertTokenizer.from_pretrained("./")
model.push_to_hub("structbert-large-zh")
tokenizer.push_to_hub("structbert-large-zh")
The official paper can be found at https://arxiv.org/abs/1908.04577.
⨠Features
We extend BERT to a new model, StructBERT, by incorporating language structures into pre - training. Specifically, we pre - train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively.
đĻ Installation
Requirements and Installation
- PyTorch version >= 1.0.1
- Install other libraries via
pip install -r requirements.txt
- For faster training install NVIDIA's apex library
đģ Usage Examples
Basic Usage
Finetune MNLI
python run_classifier_multi_task.py \
--task_name MNLI \
--do_train \
--do_eval \
--do_test \
--amp_type O1 \
--lr_decay_factor 1 \
--dropout 0.1 \
--do_lower_case \
--detach_index -1 \
--core_encoder bert \
--data_dir path_to_glue_data \
--vocab_file config/vocab.txt \
--bert_config_file config/large_bert_config.json \
--init_checkpoint path_to_pretrained_model \
--max_seq_length 128 \
--train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--fast_train \
--gradient_accumulation_steps 1 \
--output_dir path_to_output_dir
đ Documentation
Pre - trained models
Property |
Details |
Model Type |
structbert.en.large: StructBERT using the BERT - large architecture; structroberta.en.large: StructRoBERTa continue training from RoBERTa; structbert.ch.large: Chinese StructBERT; BERT - large architecture |
#params |
structbert.en.large: 340M; structroberta.en.large: 355M; structbert.ch.large: 330M |
Download |
structbert.en.large: structbert.en.large; structroberta.en.large: Coming soon; structbert.ch.large: structbert.ch.large |
Results
The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the "Example usage" section.
structbert.en.large
GLUE benchmark
Model |
MNLI |
QNLIv2 |
QQP |
SST - 2 |
MRPC |
structbert.en.large |
86.86% |
93.04% |
91.67% |
93.23% |
86.51% |
structbert.ch.large
CLUE benchmark
Model |
CMNLI |
OCNLI |
TNEWS |
AFQMC |
structbert.ch.large |
84.47% |
81.28% |
68.67% |
76.11% |
đ License
Citation
If you use our work, please cite:
@article{wang2019structbert,
title={Structbert: Incorporating language structures into pre-training for deep language understanding},
author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
journal={arXiv preprint arXiv:1908.04577},
year={2019}
}
â ī¸ Important Note
This model card is not produced by AliceMind Team.