bigbird-pegasus-large-bigpatent开源模型 - 支持长文档摘要，处理4096序列长度

首页

Bigbird Pegasus Large Bigpatent

由 google 开发

BigBird是一种基于稀疏注意力的Transformer模型，能够处理长达4096的序列，适用于长文档摘要等任务。

文本生成

Transformers

英语开源协议:Apache-2.0 #长文本摘要 #稀疏注意力 #4096长序列处理

下载量 945

发布时间 : 3/2/2022

模型简介

BigBird采用块稀疏注意力机制替代常规注意力，能以较低计算成本处理长序列，在长文档摘要等任务中表现优异。

模型特点

块稀疏注意力机制

使用块稀疏注意力替代常规注意力，显著降低长序列处理的计算成本。

长序列处理能力

能够高效处理长达4096的序列，适合长文档任务。

灵活配置

支持调整块大小和随机块数量，平衡性能与计算资源。

模型能力

长文本摘要生成

长上下文理解

使用案例

文档处理

专利文档摘要

为长专利文档生成简洁摘要

在big_patent数据集上微调后获得

长文档问答

基于长文档内容回答问题

🚀 BigBirdPegasus模型（大模型）

BigBirdPegasus模型是基于稀疏注意力机制的Transformer模型，它将基于Transformer的模型（如BERT）扩展到更长的序列处理中。此外，BigBird还从理论上阐释了稀疏模型所能处理的完整Transformer的能力。

🚀 快速开始

模型描述

BigBird采用块稀疏注意力机制，而非普通的注意力机制（如BERT的注意力机制），与BERT相比，它能够以更低的计算成本处理长度达4096的序列。在涉及超长序列的各种任务中，如长文档摘要生成、长上下文问答等，该模型都达到了当前最优水平。

如何使用

以下是在PyTorch中使用该模型获取给定文本特征的示例代码：

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

训练过程

此检查点是在 big_patent 数据集上对 BigBirdPegasusForConditionalGeneration 进行摘要生成微调后得到的。

BibTeX引用和引用信息

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}