đ BigBirdPegasus model (large)
BigBirdPegasus is a sparse - attention based transformer that extends Transformer models to handle much longer sequences, excelling in long - document summarization and other tasks.
đ Quick Start
BigBird is a sparse - attention based transformer that extends Transformer - based models like BERT to handle much longer sequences. It comes with a theoretical understanding of what a complete transformer can achieve. BigBird was introduced in this paper and first released in this repository.
Disclaimer: The team releasing BigBird did not write a model card for this model, so this model card has been written by the Hugging Face team.
⨠Features
BigBird relies on block sparse attention instead of normal attention (e.g., BERT's attention). It can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved state - of - the - art (SOTA) results on various tasks involving very long sequences, such as long document summarization and question - answering with long contexts.
đģ Usage Examples
Basic Usage
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-arxiv")
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv")
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", attention_type="original_full")
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-arxiv", block_size=16, num_random_blocks=2)
text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)
đ Documentation
This checkpoint is obtained after fine - tuning BigBirdPegasusForConditionalGeneration
for summarization on arxiv dataset from scientific_papers.
đ License
This model is released under the Apache 2.0 license.
đ Model Metrics
Property |
Details |
Model Name |
google/bigbird-pegasus-large-arxiv |
Datasets |
scientific_papers, cnn_dailymail, xsum, samsum |
Task |
Summarization |
Metrics on scientific_papers (pubmed) |
ROUGE - 1: 36.0276, ROUGE - 2: 13.4166, ROUGE - L: 21.9612, ROUGE - LSUM: 29.648, loss: 2.774355173110962, meteor: 0.2824, gen_len: 209.2537 |
Metrics on cnn_dailymail |
ROUGE - 1: 9.0885, ROUGE - 2: 1.0325, ROUGE - L: 7.3182, ROUGE - LSUM: 8.1455, loss: .nan, gen_len: 210.4762 |
Metrics on xsum |
ROUGE - 1: 4.9787, ROUGE - 2: 0.3527, ROUGE - L: 4.3679, ROUGE - LSUM: 4.1723, loss: .nan, gen_len: 230.4886 |
Metrics on scientific_papers (arxiv) |
ROUGE - 1: 43.4702, ROUGE - 2: 17.4297, ROUGE - L: 26.2587, ROUGE - LSUM: 35.5587, loss: 2.1113228797912598, gen_len: 183.3702 |
Metrics on samsum |
ROUGE - 1: 3.621, ROUGE - 2: 0.1699, ROUGE - L: 3.2016, ROUGE - LSUM: 3.3269, loss: 7.664482116699219, gen_len: 233.8107 |
đ BibTeX entry and citation info
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}