StructBERT-large-zh開源模型 - 融入語言結構優化文本處理，提升信息理解能力

首頁

Structbert Large Zh

由junnyu開發

StructBERT是通過將語言結構融入預訓練過程擴展BERT的新模型，通過兩個輔助任務充分利用詞語和句子的順序結構

大型語言模型

Transformers

中文#中文預訓練 #結構增強BERT #語言理解

下載量 77

發布時間 : 5/18/2022

模型概述

StructBERT是一種改進的BERT模型，通過在預訓練中融入語言結構，提升了在詞語和句子層面的語言理解能力

模型特點

結構感知預訓練

通過兩個輔助任務利用詞語和句子的順序結構進行預訓練

深度語言理解

在詞語和句子層面更好地捕捉語言結構

大規模預訓練

基於BERT-large架構，具有3.3億參數

模型能力

文本分類

自然語言推理

語義相似度計算

問答系統

使用案例

自然語言處理

文本分類

用於新聞分類等任務

在TNEWS數據集上達到68.67%準確率

自然語言推理

判斷句子間的邏輯關係

在CMNLI數據集上達到84.47%準確率

🚀 StructBERT：非官方副本

StructBERT是在預訓練中融入語言結構以實現深度語言理解的模型。本倉庫提供了該模型的非官方副本，包含模型下載、使用示例等內容。

🚀 快速開始

重現HFHub模型

下載模型/分詞器詞彙表：

wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_large_bert_config.json && mv ch_large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model && mv ch_model pytorch_model.bin

from transformers import BertConfig, BertModel, BertTokenizer
config = BertConfig.from_pretrained("./config.json")
model = BertModel.from_pretrained("./", config=config)
tokenizer = BertTokenizer.from_pretrained("./")
model.push_to_hub("structbert-large-zh")
tokenizer.push_to_hub("structbert-large-zh")

論文鏈接：https://arxiv.org/abs/1908.04577

✨ 主要特性

我們將BERT擴展為一個新的模型StructBERT，通過在預訓練中融入語言結構。具體來說，我們使用兩個輔助任務對StructBERT進行預訓練，以充分利用單詞和句子的順序，分別利用單詞和句子級別的語言結構。

📦 安裝指南

要求和安裝

PyTorch 版本 >= 1.0.1
通過以下命令安裝其他庫：

pip install -r requirements.txt

為了更快的訓練速度，安裝NVIDIA的 apex 庫

💻 使用示例

基礎用法

微調MNLI任務

python run_classifier_multi_task.py \
  --task_name MNLI \
  --do_train \
  --do_eval \
  --do_test \
  --amp_type O1 \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir path_to_glue_data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint path_to_pretrained_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir path_to_output_dir

📚 詳細文檔

預訓練模型

模型	描述	參數數量	下載鏈接
structbert.en.large	使用BERT-large架構的StructBERT	3.4億	structbert.en.large
structroberta.en.large	基於RoBERTa繼續訓練的StructRoBERTa	3.55億	即將推出
structbert.ch.large	中文StructBERT；BERT-large架構	3.3億	structbert.ch.large

實驗結果

GLUE和CLUE任務的結果可以使用以下“示例用法”部分列出的超參數重現。

structbert.en.large

GLUE基準測試

模型	MNLI	QNLIv2	QQP	SST - 2	MRPC
structbert.en.large	86.86%	93.04%	91.67%	93.23%	86.51%

structbert.ch.large

CLUE基準測試

模型	CMNLI	OCNLI	TNEWS	AFQMC
structbert.ch.large	84.47%	81.28%	68.67%	76.11%

📄 許可證

聲明

本模型卡片並非由 AliceMind團隊製作。

引用

如果您使用了我們的工作，請引用：

@article{wang2019structbert,
  title={Structbert: Incorporating language structures into pre-training for deep language understanding},
  author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
  journal={arXiv preprint arXiv:1908.04577},
  year={2019}
}

官方倉庫鏈接：https://github.com/alibaba/AliceMind/tree/main/StructBERT