S

Staging Pegasus Gmeetsamsum

Developed by kmfoda
PEGASUS is a Transformer-based pre-trained model specifically designed for abstractive summarization tasks. It achieves outstanding performance on multiple summarization datasets through gap-sentence pre-training.
Downloads 14
Release Time : 3/2/2022

Model Overview

PEGASUS is a pre-trained model for text summarization that employs a gap-sentence based pre-training approach, achieving state-of-the-art performance on multiple summarization datasets.

Model Features

Mixed dataset training
Trained simultaneously on C4 and HugeNews datasets with sample-size weighted mixing, improving the model's generalization capability.
Improved sentence sampling strategy
Adopts uniformly sampled gap-sentence ratios between 15%-45% and adds 20% uniform noise during important sentence sampling, enhancing model robustness.
Optimized tokenizer
Upgraded SentencePiece tokenizer to support newline character encoding, solving the issue of paragraph segmentation information loss.
Extended training duration
Increased training steps to 1.5 million, ensuring thorough convergence of pre-training perplexity.

Model Capabilities

Text summarization generation
Multi-domain summarization adaptation
Long-text processing

Use Cases

News summarization
CNN/Daily Mail news summarization
Generates concise and accurate summaries for news articles
ROUGE-1/2/L scores 44.16/21.56/41.30
XSum extreme summarization
Generates single-sentence extreme summaries
ROUGE-1/2/L scores 47.60/24.83/39.64
Academic paper summarization
arXiv paper summarization
Generates technical summaries for academic papers
ROUGE-1/2/L scores 44.21/16.95/25.67
PubMed medical summarization
Generates professional summaries for medical literature
ROUGE-1/2/L scores 45.97/20.15/28.25
Legal document summarization
Bill summarization
Generates concise summaries for legal bills
ROUGE-1/2/L scores 59.67/41.58/47.59
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase