StructBERT-large-zh开源模型 - 融入语言结构优化文本处理，提升信息理解能力

首页

Structbert Large Zh

由 junnyu 开发

StructBERT是通过将语言结构融入预训练过程扩展BERT的新模型，通过两个辅助任务充分利用词语和句子的顺序结构

大型语言模型

Transformers

中文#中文预训练 #结构增强BERT #语言理解

下载量 77

发布时间 : 5/18/2022

模型简介

StructBERT是一种改进的BERT模型，通过在预训练中融入语言结构，提升了在词语和句子层面的语言理解能力

模型特点

结构感知预训练

通过两个辅助任务利用词语和句子的顺序结构进行预训练

深度语言理解

在词语和句子层面更好地捕捉语言结构

大规模预训练

基于BERT-large架构，具有3.3亿参数

模型能力

文本分类

自然语言推理

语义相似度计算

问答系统

使用案例

自然语言处理

文本分类

用于新闻分类等任务

在TNEWS数据集上达到68.67%准确率

自然语言推理

判断句子间的逻辑关系

在CMNLI数据集上达到84.47%准确率

🚀 StructBERT：非官方副本

StructBERT是在预训练中融入语言结构以实现深度语言理解的模型。本仓库提供了该模型的非官方副本，包含模型下载、使用示例等内容。

🚀 快速开始

重现HFHub模型

下载模型/分词器词汇表：

wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_large_bert_config.json && mv ch_large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model && mv ch_model pytorch_model.bin

from transformers import BertConfig, BertModel, BertTokenizer
config = BertConfig.from_pretrained("./config.json")
model = BertModel.from_pretrained("./", config=config)
tokenizer = BertTokenizer.from_pretrained("./")
model.push_to_hub("structbert-large-zh")
tokenizer.push_to_hub("structbert-large-zh")

论文链接：https://arxiv.org/abs/1908.04577

✨ 主要特性

我们将BERT扩展为一个新的模型StructBERT，通过在预训练中融入语言结构。具体来说，我们使用两个辅助任务对StructBERT进行预训练，以充分利用单词和句子的顺序，分别利用单词和句子级别的语言结构。

📦 安装指南

要求和安装

PyTorch 版本 >= 1.0.1
通过以下命令安装其他库：

pip install -r requirements.txt

为了更快的训练速度，安装NVIDIA的 apex 库

💻 使用示例

基础用法

微调MNLI任务

python run_classifier_multi_task.py \
  --task_name MNLI \
  --do_train \
  --do_eval \
  --do_test \
  --amp_type O1 \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir path_to_glue_data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint path_to_pretrained_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir path_to_output_dir

📚 详细文档

预训练模型

模型	描述	参数数量	下载链接
structbert.en.large	使用BERT-large架构的StructBERT	3.4亿	structbert.en.large
structroberta.en.large	基于RoBERTa继续训练的StructRoBERTa	3.55亿	即将推出
structbert.ch.large	中文StructBERT；BERT-large架构	3.3亿	structbert.ch.large

实验结果

GLUE和CLUE任务的结果可以使用以下“示例用法”部分列出的超参数重现。

structbert.en.large

GLUE基准测试

模型	MNLI	QNLIv2	QQP	SST - 2	MRPC
structbert.en.large	86.86%	93.04%	91.67%	93.23%	86.51%

structbert.ch.large

CLUE基准测试

模型	CMNLI	OCNLI	TNEWS	AFQMC
structbert.ch.large	84.47%	81.28%	68.67%	76.11%

📄 许可证

声明

本模型卡片并非由 AliceMind团队制作。

引用

如果您使用了我们的工作，请引用：

@article{wang2019structbert,
  title={Structbert: Incorporating language structures into pre-training for deep language understanding},
  author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
  journal={arXiv preprint arXiv:1908.04577},
  year={2019}
}

官方仓库链接：https://github.com/alibaba/AliceMind/tree/main/StructBERT