bigbird-pegasus-large-pubmed开源模型 - 免费部署实现长文档高效摘要处理

首页

Bigbird Pegasus Large Pubmed

由 google 开发

BigBirdPegasus 是一种基于稀疏注意力的 Transformer 模型，能够处理更长的序列，特别适用于长文档摘要任务。

文本生成

Transformers

英语开源协议:Apache-2.0 #长文本摘要 #稀疏注意力 #科学论文处理

下载量 2,031

发布时间 : 3/2/2022

模型简介

BigBirdPegasus 是一种基于稀疏注意力的 Transformer 模型，扩展了传统 Transformer 的能力，能够高效处理长达 4096 的序列。它在长文档摘要等任务中表现出色。

模型特点

稀疏注意力机制

使用块稀疏注意力机制，显著降低长序列处理的计算成本。

长序列处理能力

能够高效处理长达 4096 的序列，适合长文档任务。

高性能摘要生成

在科学论文摘要生成任务中取得优异的 ROUGE 分数。

模型能力

长文档摘要生成

科学论文摘要生成

使用案例

学术研究

PubMed 论文摘要生成

为 PubMed 科学论文生成简洁准确的摘要

ROUGE-1 得分 40.8966，ROUGE-2 得分 18.1161

arXiv 论文摘要生成

为 arXiv 科学论文生成摘要

ROUGE-1 得分 40.3815，ROUGE-2 得分 14.374

🚀 大飞鸟飞马模型（大尺寸）

大飞鸟飞马模型是基于稀疏注意力机制的Transformer模型，它将基于Transformer的模型（如BERT）扩展到更长的序列处理。此外，大飞鸟模型还从理论上阐释了稀疏模型所能处理的完整Transformer的能力。

🚀 快速开始

大飞鸟模型依赖于块稀疏注意力机制，而非普通的注意力机制（如BERT的注意力机制），与BERT相比，它能够以更低的计算成本处理长度达4096的序列。该模型在各种涉及长序列的任务中取得了最优结果，例如长文档摘要、长上下文问答等。

✨ 主要特性

基于块稀疏注意力机制，可处理长序列。
计算成本相较于BERT更低。
在长文档摘要、长上下文问答等任务中达到了最优效果。

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

📚 详细文档

模型描述

大飞鸟模型依赖于块稀疏注意力机制，而非普通的注意力机制（即BERT的注意力机制），与BERT相比，它能够以更低的计算成本处理长度达4096的序列。该模型在各种涉及长序列的任务中取得了最优结果，例如长文档摘要、长上下文问答等。

训练过程

此检查点是在科学论文中的 pubmed数据集 上对 BigBirdPegasusForConditionalGeneration 进行 摘要任务 微调后得到的。

引用信息

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

数据集和指标

属性	详情
模型类型	大飞鸟飞马模型（大尺寸）
训练数据	来自科学论文数据集的pubmed数据集
任务类型	摘要任务
评估数据集	scientific_papers（pubmed和arxiv配置）
ROUGE - 1（pubmed）	40.8966
ROUGE - 2（pubmed）	18.1161
ROUGE - L（pubmed）	26.1743
ROUGE - LSUM（pubmed）	34.2773
损失（pubmed）	2.1707184314727783
流星指标（pubmed）	0.3513
生成长度（pubmed）	221.2531
ROUGE - 1（arxiv）	40.3815
ROUGE - 2（arxiv）	14.374
ROUGE - L（arxiv）	23.4773
ROUGE - LSUM（arxiv）	33.772
损失（arxiv）	3.235051393508911
生成长度（arxiv）	186.2003