The open-source abstract model bigbird - pegasus - large - K - booksum easily generates understandable and interpretable summaries.

Bigbird Pegasus Large K Booksum

Developed by pszemraj

A BigBird Pegasus model trained on the booksum dataset, specializing in generating explanatory summaries aimed at providing easily readable and understandable content.

Text Generation

Transformers

EnglishOpen Source License:Apache-2.0 #Long-text summarization #Explanatory summarization #Book content summarization

Downloads 71

Release Time : 3/2/2022

Model Overview

This model is a summarization generation model based on the BigBird Pegasus architecture, trained on the booksum dataset, capable of producing summaries that not only consolidate information but also explain its significance.

Model Features

Explanatory summarization

Generates summaries that not only consolidate information but also explain its importance, making the summaries easier to understand and read.

Long-text processing

Based on the BigBird architecture, it can effectively process sequences up to 4096 in length, making it suitable for long-document summarization tasks.

Efficient computation

Uses block-sparse attention mechanisms, resulting in lower computational costs compared to traditional attention mechanisms.

Model Capabilities

Text summarization

Long-text processing

Explanatory summarization generation

Use Cases

Academic research

Scientific paper summarization

Generates concise and easily understandable summaries for scientific papers, helping researchers quickly grasp the content.

Achieved a ROUGE-1 score of 34.0757 on the booksum dataset.

Education

Textbook summarization

Generates summaries for textbook chapters, helping students quickly grasp key concepts.

Business

Business report summarization

Generates summaries for lengthy business reports, facilitating quick decision-making by management.

Achieved a ROUGE-1 score of 40.015 on the gov_report dataset.

🚀 pszemraj/bigbird-pegasus-large-K-booksum

This is a model for summarization tasks. It addresses the high computational cost issue of Transformer - based models on long sequences by using block sparse attention, enabling it to handle sequences up to a length of 4096 at a lower cost.

✨ Features

Summarization - Oriented: Specialized for summarization tasks, with good performance on datasets like kmfoda/booksum.
Efficient on Long Sequences: Utilizes block sparse attention to reduce the computational complexity compared to traditional Transformer - based models, capable of handling sequences up to 4096 in length.
SOTA Results: Achieved state - of - the - art results on various tasks involving very long sequences.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

You can use this model for summarization tasks. Here is an example of using the model with 🤗Transformers (assuming the model is loaded correctly):

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("pszemraj/bigbird-pegasus-large-K-booksum")
model = AutoModelForSeq2SeqLM.from_pretrained("pszemraj/bigbird-pegasus-large-K-booksum")

input_text = "large earthquakes along a given fault segment do not occur at random intervals because it takes time to accumulate the strain energy for the rupture. The rates at which tectonic plates move and accumulate strain at their boundaries are approximately uniform. Therefore, in first approximation, one may expect that large ruptures of the same fault segment will occur at approximately constant time intervals. If subsequent main shocks have different amounts of slip across the fault, then the recurrence time may vary, and the basic idea of periodic mainshocks must be modified. For great plate boundary ruptures the length and slip often vary by a factor of 2. Along the southern segment of the San Andreas fault the recurrence interval is 145 years with variations of several decades. The smaller the standard deviation of the average recurrence interval, the more specific could be the long term prediction of a future mainshock."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length = 64, no_repeat_ngram_size = 2, encoder_no_repeat_ngram_size = 3, repetition_penalty = 2.4, length_penalty = 0.5, num_beams = 4, early_stopping = True)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)

Advanced Usage

In more complex scenarios, you can adjust the generation parameters according to your specific requirements. For example, if you want to generate a more diverse summary, you can reduce the num_beams parameter:

# Advanced scenario: Generate a more diverse summary by reducing num_beams
outputs = model.generate(**inputs, max_length = 64, no_repeat_ngram_size = 2, encoder_no_repeat_ngram_size = 3, repetition_penalty = 2.4, length_penalty = 0.5, num_beams = 2, early_stopping = True)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)

📚 Documentation

Model Information

Property	Details
Model Type	A BigBird - based model for summarization
Training Data	`kmfoda/booksum`

Inference Parameters

max_length: 64
no_repeat_ngram_size: 2
encoder_no_repeat_ngram_size: 3
repetition_penalty: 2.4
length_penalty: 0.5
num_beams: 4
early_stopping: true

Model Performance

The model has been evaluated on the kmfoda/booksum dataset, and the following are the ROUGE scores:

Metric	Value
ROUGE - 1	34.0757
ROUGE - 2	5.9177
ROUGE - L	16.3874
ROUGE - LSUM	31.6118

🔧 Technical Details

The model is based on the BigBird architecture, which uses block sparse attention instead of normal attention. This allows it to handle sequences up to a length of 4096 at a much lower computational cost compared to BERT. By approximating the full attention matrix, BigBird can reduce the time and memory complexity from (O(n^{2})) to a more manageable level, making it suitable for long - sequence NLP tasks.

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご