led-large-book-summary Open-source Long Document Summarization Model - Free Deployment for Book Chapter and Full-book Summaries

Led Large Book Summary

Developed by pszemraj

A long-document summarization model based on the LED architecture, specifically optimized for book chapter and full-book level summarization tasks

Text Generation

Transformers

Supports Multiple LanguagesOpen Source License:Bsd-3-clause #Long Document Summarization #BookSum Dataset #ROUGE Optimization

Downloads 4,934

Release Time : 3/2/2022

Model Overview

This model employs an LED variant of the Longformer architecture, designed specifically for generating abstract summaries of long documents (such as book chapters or entire books). Trained on the BookSum dataset, it excels at processing texts with complex narrative structures and long-range dependencies.

Model Features

Long Document Processing Capability

Can effectively handle input sequences up to 16K tokens, suitable for book chapter and full-book summarization

Efficient Attention Mechanism

Utilizes Longformer's sparse attention pattern to reduce computational complexity in long sequence processing

High-Quality Abstract Summarization

Trained on the BookSum dataset, capable of generating fluent summaries that retain the core content of the original text

Multi-Granularity Summarization Support

Supports summarization generation at various granularities, from paragraph-level to full-book level

Model Capabilities

Long Text Comprehension

Abstract Summarization Generation

Narrative Structure Preservation

Key Information Extraction

Use Cases

Academic Research

Literature Review Assistance

Quickly generate summaries of lengthy research papers or monographs

Helps researchers quickly grasp the core content of the literature

Publishing Industry

Book Content Summarization

Automatically generate summaries of book chapters or entire books for publishers

Achieves a ROUGE-L score of 16.14 (BookSum test set)

Educational Applications

Textbook Content Condensation

Simplify complex textbook content into easily understandable summaries

Preserves key concepts and knowledge structures

🚀 pszemraj/led-large-book-summary

This model is designed for text summarization, especially for long - form narrative texts. It addresses the limitations of existing summarization datasets by using BookSum, a collection of long - form narrative data. It can handle long documents with non - trivial causal and temporal dependencies, offering a more challenging and realistic scenario for summarization tasks.

🚀 Quick Start

You can use this model for summarization tasks with the following parameters:

parameters = {
    "max_length": 64,
    "min_length": 8,
    "no_repeat_ngram_size": 3,
    "early_stopping": True,
    "repetition_penalty": 3.5,
    "length_penalty": 0.3,
    "encoder_no_repeat_ngram_size": 3,
    "num_beams": 4
}

✨ Features

Long - form Summarization: Can handle long sequences up to a length of 4096, suitable for summarizing long documents like novels, plays, and stories.
Low Computational Cost: Relies on block sparse attention, reducing the computational complexity compared to traditional models like BERT.
Multiple Granularity Summaries: Can generate summaries at paragraph, chapter, and book levels.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

# Example of using the model for summarization
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "pszemraj/led-large-book-summary"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

input_text = "large earthquakes along a given fault segment do not occur at random intervals because it takes time to accumulate the strain energy for the rupture. The rates at which tectonic plates move and accumulate strain at their boundaries are approximately uniform. Therefore, in first approximation, one may expect that large ruptures of the same fault segment will occur at approximately constant time intervals. If subsequent main shocks have different amounts of slip across the fault, then the recurrence time may vary, and the basic idea of periodic mainshocks must be modified. For great plate boundary ruptures the length and slip often vary by a factor of 2. Along the southern segment of the San Andreas fault the recurrence interval is 145 years with variations of several decades. The smaller the standard deviation of the average recurrence interval, the more specific could be the long term prediction of a future mainshock."
inputs = tokenizer(input_text, return_tensors="pt")
summary_ids = model.generate(**inputs, **parameters)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)

Advanced Usage

# Advanced usage can involve fine - tuning the model on custom datasets
# Here is a high - level example of fine - tuning
from transformers import TrainingArguments, Trainer

# Assume we have a custom dataset
custom_dataset = ...

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=custom_dataset
)

trainer.train()

📚 Documentation

Model Details

Property	Details
Model Type	A summarization model based on the BigBird architecture, which uses block sparse attention to handle long sequences.
Training Data	The model is trained on the kmfoda/booksum dataset, which includes long - form narrative texts from the literature domain.

Evaluation Results

The model pszemraj/led-large-book-summary has been evaluated on the kmfoda/booksum dataset with the following results:

Task	Dataset	Metric	Value
Summarization	kmfoda/booksum (test split)	ROUGE - 1	31.7308

🔧 Technical Details

Attention Mechanism

Transformer - based models usually have an $O(n^2)$ time and memory complexity, which is computationally expensive for long sequences. BigBird, the architecture behind this model, uses block sparse attention instead of normal attention. This allows it to handle sequences up to a length of 4096 at a much lower computational cost compared to BERT.

Dataset Challenges

The BookSum dataset used for training and evaluation poses unique challenges for summarization systems. It includes long - form narrative texts with non - trivial causal and temporal dependencies and rich discourse structures. Summarization systems need to process very long documents and deal with these complex relationships.

📄 License

This project is licensed under the Apache 2.0 and BSD 3 - Clause licenses.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご