bigbird-pegasus-large-arxiv Open Source Model - Handle Long Sequences and Efficiently Complete Long Document Summarization Tasks

Bigbird Pegasus Large Arxiv

Developed by google

BigBird is a Transformer model based on sparse attention, capable of handling longer sequences, suitable for tasks like long document summarization.

Text Generation

Transformers

EnglishOpen Source License:Apache-2.0 #Long Text Summarization #Sparse Attention #Scientific Paper Processing

Downloads 8,528

Release Time : 3/2/2022

Model Overview

BigBirdPegasus is a Transformer model based on sparse attention, extending the capabilities of traditional Transformers, particularly suited for long-sequence tasks such as scientific paper summarization.

Model Features

Sparse Attention Mechanism

Employs block sparse attention mechanism, efficiently processing sequences up to 4096 in length with lower computational cost than traditional attention mechanisms.

Long Sequence Processing Capability

Particularly suitable for ultra-long sequence tasks, such as long document summarization and long-context question answering.

Multi-Task Adaptation

Performs well on various summarization tasks, including scientific papers and news summarization.

Model Capabilities

Scientific Paper Summarization

Long Document Summarization

Long-Context Question Answering

Use Cases

Academic Research

Scientific Paper Summarization

Generates concise and accurate summaries for scientific papers.

Achieves a ROUGE-1 score of 43.4702 on the arxiv dataset.

News Media

News Summarization

Generates summaries for news articles.

Achieves a ROUGE-1 score of 9.0885 on the cnn_dailymail dataset.

🚀 BigBirdPegasus model (large)

BigBirdPegasus is a sparse - attention based transformer that extends Transformer models to handle much longer sequences, excelling in long - document summarization and other tasks.

🚀 Quick Start

BigBird is a sparse - attention based transformer that extends Transformer - based models like BERT to handle much longer sequences. It comes with a theoretical understanding of what a complete transformer can achieve. BigBird was introduced in this paper and first released in this repository.

Disclaimer: The team releasing BigBird did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

BigBird relies on block sparse attention instead of normal attention (e.g., BERT's attention). It can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved state - of - the - art (SOTA) results on various tasks involving very long sequences, such as long document summarization and question - answering with long contexts.

💻 Usage Examples

Basic Usage

from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")

# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")

# decoder attention type can't be changed & will be "original_full"
# you can change `attention_type` (encoder only) to full attention like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", block_size=16, num_random_blocks=2)

text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)

📚 Documentation

This checkpoint is obtained after fine - tuning BigBirdPegasusForConditionalGeneration for summarization on arxiv dataset from scientific_papers.

📄 License

This model is released under the Apache 2.0 license.

📊 Model Metrics

Property	Details
Model Name	google/bigbird-pegasus-large-arxiv
Datasets	scientific_papers, cnn_dailymail, xsum, samsum
Task	Summarization
Metrics on scientific_papers (pubmed)	ROUGE - 1: 36.0276, ROUGE - 2: 13.4166, ROUGE - L: 21.9612, ROUGE - LSUM: 29.648, loss: 2.774355173110962, meteor: 0.2824, gen_len: 209.2537
Metrics on cnn_dailymail	ROUGE - 1: 9.0885, ROUGE - 2: 1.0325, ROUGE - L: 7.3182, ROUGE - LSUM: 8.1455, loss: .nan, gen_len: 210.4762
Metrics on xsum	ROUGE - 1: 4.9787, ROUGE - 2: 0.3527, ROUGE - L: 4.3679, ROUGE - LSUM: 4.1723, loss: .nan, gen_len: 230.4886
Metrics on scientific_papers (arxiv)	ROUGE - 1: 43.4702, ROUGE - 2: 17.4297, ROUGE - L: 26.2587, ROUGE - LSUM: 35.5587, loss: 2.1113228797912598, gen_len: 183.3702
Metrics on samsum	ROUGE - 1: 3.621, ROUGE - 2: 0.1699, ROUGE - L: 3.2016, ROUGE - LSUM: 3.3269, loss: 7.664482116699219, gen_len: 233.8107

📖 BibTeX entry and citation info

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご