đ BigBirdPegasus model (large)
BigBirdPegasus is a large - scale model based on the BigBird architecture. It extends Transformer - based models to handle much longer sequences, offering significant advantages in tasks like long - document summarization.
đ Quick Start
Basic Usage
Here is how to use this model to get the features of a given text in PyTorch:
from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")
model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)
text = "Replace me by any text you'd like."
inputs = tokenizer(text, return_tensors='pt')
prediction = model.generate(**inputs)
prediction = tokenizer.batch_decode(prediction)
⨠Features
- Block Sparse Attention: BigBird relies on block sparse attention instead of normal attention (i.e., BERT's attention). This allows it to handle sequences up to a length of 4096 at a much lower compute cost compared to BERT.
- SOTA Performance: It has achieved state - of - the - art (SOTA) results on various tasks involving very long sequences, such as long documents summarization and question - answering with long contexts.
đ Documentation
Model description
BigBird extends Transformer - based models like BERT. It was introduced in this paper and first released in this repository. The model uses block sparse attention, enabling it to process long sequences more efficiently.
Training Procedure
This checkpoint is obtained after fine - tuning BigBirdPegasusForConditionalGeneration
for summarization on pubmed dataset from scientific_papers.
đ License
This model is licensed under the Apache - 2.0 license.
đ Model Metrics
Property |
Details |
Model Type |
BigBirdPegasus (large) |
Training Data |
scientific_papers (pubmed, arxiv) |
Pubmed Dataset
Metric |
Value |
ROUGE - 1 |
40.8966 |
ROUGE - 2 |
18.1161 |
ROUGE - L |
26.1743 |
ROUGE - LSUM |
34.2773 |
loss |
2.1707184314727783 |
meteor |
0.3513 |
gen_len |
221.2531 |
Arxiv Dataset
Metric |
Value |
ROUGE - 1 |
40.3815 |
ROUGE - 2 |
14.374 |
ROUGE - L |
23.4773 |
ROUGE - LSUM |
33.772 |
loss |
3.235051393508911 |
gen_len |
186.2003 |
đ BibTeX entry and citation info
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
â ī¸ Important Note
The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.