StructBERT-large-zh Open-source Model - Incorporating Language Structure to Optimize Text Processing and Enhance Information Comprehension Ability

Structbert Large Zh

Developed by junnyu

StructBERT is a novel model that extends BERT by incorporating linguistic structures into the pre-training process, leveraging two auxiliary tasks to fully utilize word and sentence order structures

Large Language Model

Transformers

Chinese#Chinese Pre-training #Structure-enhanced BERT #Language Understanding

Downloads 77

Release Time : 5/18/2022

Model Overview

StructBERT is an improved BERT model that enhances language understanding at both word and sentence levels by incorporating linguistic structures during pre-training

Model Features

Structure-aware Pre-training

Utilizes word and sentence order structures through two auxiliary tasks during pre-training

Deep Language Understanding

Better captures linguistic structures at both word and sentence levels

Large-scale Pre-training

Based on BERT-large architecture with 330 million parameters

Model Capabilities

Text Classification

Natural Language Inference

Semantic Similarity Calculation

Question Answering Systems

Use Cases

Natural Language Processing

Text Classification

Used for tasks like news categorization

Achieved 68.67% accuracy on TNEWS dataset

Natural Language Inference

Determines logical relationships between sentences

Achieved 84.47% accuracy on CMNLI dataset

🚀 StructBERT: Un-Official Copy

This is an unofficial copy of StructBERT. It extends BERT by incorporating language structures into pre - training, offering enhanced language understanding capabilities.

🚀 Quick Start

Reproduce HFHub models

Download model/tokenizer vocab:

wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_large_bert_config.json && mv ch_large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model && mv ch_model pytorch_model.bin

from transformers import BertConfig, BertModel, BertTokenizer
config = BertConfig.from_pretrained("./config.json")
model = BertModel.from_pretrained("./", config=config)
tokenizer = BertTokenizer.from_pretrained("./")
model.push_to_hub("structbert-large-zh")
tokenizer.push_to_hub("structbert-large-zh")

The official paper can be found at https://arxiv.org/abs/1908.04577.

✨ Features

We extend BERT to a new model, StructBERT, by incorporating language structures into pre - training. Specifically, we pre - train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively.

📦 Installation

Requirements and Installation

PyTorch version >= 1.0.1
Install other libraries via

pip install -r requirements.txt

For faster training install NVIDIA's apex library

💻 Usage Examples

Basic Usage

Finetune MNLI

python run_classifier_multi_task.py \
  --task_name MNLI \
  --do_train \
  --do_eval \
  --do_test \
  --amp_type O1 \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir path_to_glue_data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint path_to_pretrained_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir path_to_output_dir

📚 Documentation

Pre - trained models

Property	Details
Model Type	structbert.en.large: StructBERT using the BERT - large architecture; structroberta.en.large: StructRoBERTa continue training from RoBERTa; structbert.ch.large: Chinese StructBERT; BERT - large architecture
#params	structbert.en.large: 340M; structroberta.en.large: 355M; structbert.ch.large: 330M
Download	structbert.en.large: structbert.en.large; structroberta.en.large: Coming soon; structbert.ch.large: structbert.ch.large

Results

The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the "Example usage" section.

structbert.en.large

GLUE benchmark

Model	MNLI	QNLIv2	QQP	SST - 2	MRPC
structbert.en.large	86.86%	93.04%	91.67%	93.23%	86.51%

structbert.ch.large

CLUE benchmark

Model	CMNLI	OCNLI	TNEWS	AFQMC
structbert.ch.large	84.47%	81.28%	68.67%	76.11%

📄 License

Citation

If you use our work, please cite:

@article{wang2019structbert,
  title={Structbert: Incorporating language structures into pre-training for deep language understanding},
  author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
  journal={arXiv preprint arXiv:1908.04577},
  year={2019}
}

⚠️ Important Note

This model card is not produced by AliceMind Team.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご