bigbird-pegasus-large-pubmed Open-source Model - Free Deployment for Efficient Summarization of Long Documents

Bigbird Pegasus Large Pubmed

Developed by google

BigBirdPegasus is a Transformer model based on sparse attention, capable of handling longer sequences, especially suitable for long document summarization tasks.

Text Generation

Transformers

EnglishOpen Source License:Apache-2.0 #Long Text Summarization #Sparse Attention #Scientific Paper Processing

Downloads 2,031

Release Time : 3/2/2022

Model Overview

BigBirdPegasus is a Transformer model based on sparse attention that extends the capabilities of traditional Transformers, efficiently processing sequences up to 4096 tokens in length. It excels in tasks such as long document summarization.

Model Features

Sparse Attention Mechanism

Uses block sparse attention mechanism to significantly reduce computational costs for long sequence processing.

Long Sequence Processing Capability

Efficiently processes sequences up to 4096 tokens, making it suitable for long document tasks.

High-Performance Summarization

Achieves excellent ROUGE scores in scientific paper summarization tasks.

Model Capabilities

Long Document Summarization

Scientific Paper Summarization

Use Cases

Academic Research

PubMed Paper Summarization

Generates concise and accurate summaries for PubMed scientific papers

ROUGE-1 score 40.8966, ROUGE-2 score 18.1161

arXiv Paper Summarization

Generates summaries for arXiv scientific papers

ROUGE-1 score 40.3815, ROUGE-2 score 14.374

🚀 BigBirdPegasus model (large)

BigBirdPegasus is a large - scale model based on the BigBird architecture. It extends Transformer - based models to handle much longer sequences, offering significant advantages in tasks like long - document summarization.

🚀 Quick Start

Basic Usage

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

✨ Features

Block Sparse Attention: BigBird relies on block sparse attention instead of normal attention (i.e., BERT's attention). This allows it to handle sequences up to a length of 4096 at a much lower compute cost compared to BERT.
SOTA Performance: It has achieved state - of - the - art (SOTA) results on various tasks involving very long sequences, such as long documents summarization and question - answering with long contexts.

📚 Documentation

Model description

BigBird extends Transformer - based models like BERT. It was introduced in this paper and first released in this repository. The model uses block sparse attention, enabling it to process long sequences more efficiently.

Training Procedure

This checkpoint is obtained after fine - tuning BigBirdPegasusForConditionalGeneration for summarization on pubmed dataset from scientific_papers.

📄 License

This model is licensed under the Apache - 2.0 license.

📊 Model Metrics

Property	Details
Model Type	BigBirdPegasus (large)
Training Data	scientific_papers (pubmed, arxiv)

Pubmed Dataset

Metric	Value
ROUGE - 1	40.8966
ROUGE - 2	18.1161
ROUGE - L	26.1743
ROUGE - LSUM	34.2773
loss	2.1707184314727783
meteor	0.3513
gen_len	221.2531

Arxiv Dataset

Metric	Value
ROUGE - 1	40.3815
ROUGE - 2	14.374
ROUGE - L	23.4773
ROUGE - LSUM	33.772
loss	3.235051393508911
gen_len	186.2003

📖 BibTeX entry and citation info

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

⚠️ Important Note

The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご