đ pszemraj/led-large-book-summary
This model is designed for text summarization, especially for long - form narrative texts. It addresses the limitations of existing summarization datasets by using BookSum, a collection of long - form narrative data. It can handle long documents with non - trivial causal and temporal dependencies, offering a more challenging and realistic scenario for summarization tasks.
đ Quick Start
You can use this model for summarization tasks with the following parameters:
parameters = {
"max_length": 64,
"min_length": 8,
"no_repeat_ngram_size": 3,
"early_stopping": True,
"repetition_penalty": 3.5,
"length_penalty": 0.3,
"encoder_no_repeat_ngram_size": 3,
"num_beams": 4
}
⨠Features
- Long - form Summarization: Can handle long sequences up to a length of 4096, suitable for summarizing long documents like novels, plays, and stories.
- Low Computational Cost: Relies on block sparse attention, reducing the computational complexity compared to traditional models like BERT.
- Multiple Granularity Summaries: Can generate summaries at paragraph, chapter, and book levels.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "pszemraj/led-large-book-summary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
input_text = "large earthquakes along a given fault segment do not occur at random intervals because it takes time to accumulate the strain energy for the rupture. The rates at which tectonic plates move and accumulate strain at their boundaries are approximately uniform. Therefore, in first approximation, one may expect that large ruptures of the same fault segment will occur at approximately constant time intervals. If subsequent main shocks have different amounts of slip across the fault, then the recurrence time may vary, and the basic idea of periodic mainshocks must be modified. For great plate boundary ruptures the length and slip often vary by a factor of 2. Along the southern segment of the San Andreas fault the recurrence interval is 145 years with variations of several decades. The smaller the standard deviation of the average recurrence interval, the more specific could be the long term prediction of a future mainshock."
inputs = tokenizer(input_text, return_tensors="pt")
summary_ids = model.generate(**inputs, **parameters)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)
Advanced Usage
from transformers import TrainingArguments, Trainer
custom_dataset = ...
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=custom_dataset
)
trainer.train()
đ Documentation
Model Details
Property |
Details |
Model Type |
A summarization model based on the BigBird architecture, which uses block sparse attention to handle long sequences. |
Training Data |
The model is trained on the kmfoda/booksum dataset, which includes long - form narrative texts from the literature domain. |
Evaluation Results
The model pszemraj/led-large-book-summary
has been evaluated on the kmfoda/booksum
dataset with the following results:
Task |
Dataset |
Metric |
Value |
Summarization |
kmfoda/booksum (test split) |
ROUGE - 1 |
31.7308 |
đ§ Technical Details
Attention Mechanism
Transformer - based models usually have an $O(n^2)$ time and memory complexity, which is computationally expensive for long sequences. BigBird, the architecture behind this model, uses block sparse attention instead of normal attention. This allows it to handle sequences up to a length of 4096 at a much lower computational cost compared to BERT.
Dataset Challenges
The BookSum dataset used for training and evaluation poses unique challenges for summarization systems. It includes long - form narrative texts with non - trivial causal and temporal dependencies and rich discourse structures. Summarization systems need to process very long documents and deal with these complex relationships.
đ License
This project is licensed under the Apache 2.0 and BSD 3 - Clause licenses.