đ long-t5-tglobal-xl + BookSum
This model can summarize long text and generate a SparkNotes-like summary for any topic! It generalizes well to academic and narrative text. The XL checkpoint produces even better summaries from a human evaluation perspective.
đ License
đ Tags
- summarization
- summary
- booksum
- long-document
- long-form
- tglobal-xl
- XL
đ Datasets
đ Metrics
â ī¸ Inference
Inference is disabled.
đ Model Index
Property |
Details |
Model Name |
pszemraj/long-t5-tglobal-xl-16384-book-summary |
Results |
|
|
Task |
|
Dataset |
|
Metrics |
â ī¸ Important Note
As of this discussion, we found issues with long-t5 models >= 4.23.0. Please use pip install transformers==4.22.0
to ensure good performance with this model.
A simple example/use case with the base model on ASR is here.
⨠Features
- Generalizes well to academic & narrative text.
- The XL checkpoint produces better summaries.
đĻ Installation
pip install -U transformers
đģ Usage Examples
Basic Usage
import torch
from transformers import pipeline
summarizer = pipeline(
"summarization",
"pszemraj/long-t5-tglobal-xl-16384-book-summary",
device=0 if torch.cuda.is_available() else -1,
)
long_text = "Here is a lot of text I don't want to read. Replace me"
result = summarizer(long_text)
print(result[0]["summary_text"])
Advanced Usage
Adjusting parameters
Pass other parameters related to beam search textgen when calling summarizer
to get even higher quality results.
LLM.int8 Quantization
First, make sure you have the latest versions of the relevant packages:
pip install -U transformers bitsandbytes accelerate
Load in 8-bit:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained(
"pszemraj/long-t5-tglobal-xl-16384-book-summary"
)
model = AutoModelForSeq2SeqLM.from_pretrained(
"pszemraj/long-t5-tglobal-xl-16384-book-summary",
load_in_8bit=True,
device_map="auto",
)
đ Documentation
Description
A fine-tuned version of google/long-t5-tglobal-xl on the kmfoda/booksum
dataset. Read the paper by Guo et al. here: LongT5: Efficient Text-To-Text Transformer for Long Sequences.
Intended uses & limitations
While this model seems to improve factual consistency, don't take summaries as foolproof and check things that seem odd. Specifically, be careful with negation statements. You can usually check this by comparing a particular statement with what the surrounding sentences imply.
Training and evaluation data
kmfoda/booksum
dataset on HuggingFace - read the original paper here.
- For initial fine-tuning, only input text with 12288 input tokens or less and 1024 output tokens or less was used. After a quick analysis, summaries in the 12288-16384 range are in the small minority in this dataset.
- The final stages of fine-tuning used the standard 16384 input/1024 output conventions, preserving the standard in/out lengths.
Eval results
Official results with the model evaluator will be computed and posted here. The model achieves the following results on the evaluation set:
- eval_loss: 1.2756
- eval_rouge1: 41.8013
- eval_rouge2: 12.0895
- eval_rougeL: 21.6007
- eval_rougeLsum: 39.5382
- eval_gen_len: 387.2945
- eval_runtime: 13908.4995
- eval_samples_per_second: 0.107
- eval_steps_per_second: 0.027
FAQ
How can I run inference with this on CPU?
lol
How to run inference over a very long (30k+ tokens) document in batches?
See summarize.py
in the code for my hf space Document Summarization. You can also use the same code to split a document into batches of 4096, etc., and iterate over them with the model. This is useful in situations where CUDA memory is limited.
How to fine-tune further?
See train with a script and the summarization scripts.
Are there simpler ways to run this?
For this reason, I created a Python package utility called textsum. You can use it to load models and summarize things in a few lines of code.
pip install textsum
Use textsum
in python with this model:
from textsum.summarize import Summarizer
summarizer = Summarizer(
model_name_or_path="pszemraj/long-t5-tglobal-xl-16384-book-summary"
)
long_string = "This is a long string of text that will be s
đ§ Technical Details
Training procedure
Updates
TBD
Training hyperparameters
TBD
Framework versions
TBD