bigbird-pegasus-large-bigpatent開源模型 - 支持長文檔摘要，處理4096序列長度

首頁

Bigbird Pegasus Large Bigpatent

由google開發

BigBird是一種基於稀疏注意力的Transformer模型，能夠處理長達4096的序列，適用於長文檔摘要等任務。

文本生成

Transformers

英語開源協議:Apache-2.0 #長文本摘要 #稀疏注意力 #4096長序列處理

下載量 945

發布時間 : 3/2/2022

模型概述

BigBird採用塊稀疏注意力機制替代常規注意力，能以較低計算成本處理長序列，在長文檔摘要等任務中表現優異。

模型特點

塊稀疏注意力機制

使用塊稀疏注意力替代常規注意力，顯著降低長序列處理的計算成本。

長序列處理能力

能夠高效處理長達4096的序列，適合長文檔任務。

靈活配置

支持調整塊大小和隨機塊數量，平衡性能與計算資源。

模型能力

長文本摘要生成

長上下文理解

使用案例

文檔處理

專利文檔摘要

為長專利文檔生成簡潔摘要

在big_patent數據集上微調後獲得

長文檔問答

基於長文檔內容回答問題

🚀 BigBirdPegasus模型（大模型）

BigBirdPegasus模型是基於稀疏注意力機制的Transformer模型，它將基於Transformer的模型（如BERT）擴展到更長的序列處理中。此外，BigBird還從理論上闡釋了稀疏模型所能處理的完整Transformer的能力。

🚀 快速開始

模型描述

BigBird採用塊稀疏注意力機制，而非普通的注意力機制（如BERT的注意力機制），與BERT相比，它能夠以更低的計算成本處理長度達4096的序列。在涉及超長序列的各種任務中，如長文檔摘要生成、長上下文問答等，該模型都達到了當前最優水平。

如何使用

以下是在PyTorch中使用該模型獲取給定文本特徵的示例代碼：

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-bigpatent", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

訓練過程

此檢查點是在 big_patent 數據集上對 BigBirdPegasusForConditionalGeneration 進行摘要生成微調後得到的。

BibTeX引用和引用信息

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}