🚀 pszemraj/bigbird-pegasus-large-K-booksum
This is a model for summarization tasks. It addresses the high computational cost issue of Transformer - based models on long sequences by using block sparse attention, enabling it to handle sequences up to a length of 4096 at a lower cost.
✨ Features
- Summarization - Oriented: Specialized for summarization tasks, with good performance on datasets like
kmfoda/booksum
.
- Efficient on Long Sequences: Utilizes block sparse attention to reduce the computational complexity compared to traditional Transformer - based models, capable of handling sequences up to 4096 in length.
- SOTA Results: Achieved state - of - the - art results on various tasks involving very long sequences.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
You can use this model for summarization tasks. Here is an example of using the model with 🤗Transformers (assuming the model is loaded correctly):
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("pszemraj/bigbird-pegasus-large-K-booksum")
model = AutoModelForSeq2SeqLM.from_pretrained("pszemraj/bigbird-pegasus-large-K-booksum")
input_text = "large earthquakes along a given fault segment do not occur at random intervals because it takes time to accumulate the strain energy for the rupture. The rates at which tectonic plates move and accumulate strain at their boundaries are approximately uniform. Therefore, in first approximation, one may expect that large ruptures of the same fault segment will occur at approximately constant time intervals. If subsequent main shocks have different amounts of slip across the fault, then the recurrence time may vary, and the basic idea of periodic mainshocks must be modified. For great plate boundary ruptures the length and slip often vary by a factor of 2. Along the southern segment of the San Andreas fault the recurrence interval is 145 years with variations of several decades. The smaller the standard deviation of the average recurrence interval, the more specific could be the long term prediction of a future mainshock."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length = 64, no_repeat_ngram_size = 2, encoder_no_repeat_ngram_size = 3, repetition_penalty = 2.4, length_penalty = 0.5, num_beams = 4, early_stopping = True)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)
Advanced Usage
In more complex scenarios, you can adjust the generation parameters according to your specific requirements. For example, if you want to generate a more diverse summary, you can reduce the num_beams
parameter:
outputs = model.generate(**inputs, max_length = 64, no_repeat_ngram_size = 2, encoder_no_repeat_ngram_size = 3, repetition_penalty = 2.4, length_penalty = 0.5, num_beams = 2, early_stopping = True)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)
📚 Documentation
Model Information
Property |
Details |
Model Type |
A BigBird - based model for summarization |
Training Data |
kmfoda/booksum |
Inference Parameters
max_length
: 64
no_repeat_ngram_size
: 2
encoder_no_repeat_ngram_size
: 3
repetition_penalty
: 2.4
length_penalty
: 0.5
num_beams
: 4
early_stopping
: true
Model Performance
The model has been evaluated on the kmfoda/booksum
dataset, and the following are the ROUGE scores:
Metric |
Value |
ROUGE - 1 |
34.0757 |
ROUGE - 2 |
5.9177 |
ROUGE - L |
16.3874 |
ROUGE - LSUM |
31.6118 |
🔧 Technical Details
The model is based on the BigBird architecture, which uses block sparse attention instead of normal attention. This allows it to handle sequences up to a length of 4096 at a much lower computational cost compared to BERT. By approximating the full attention matrix, BigBird can reduce the time and memory complexity from (O(n^{2})) to a more manageable level, making it suitable for long - sequence NLP tasks.
📄 License
This model is released under the Apache - 2.0 license.