D

Distill Pegasus Xsum 16 4

Developed by sshleifer
PEGASUS is an abstractive summarization pre-training model based on extracted gap sentences, developed by the Google Research team.
Downloads 137
Release Time : 3/2/2022

Model Overview

PEGASUS is a pre-trained model specifically designed for text summarization generation, performing abstractive summarization by extracting important sentences.

Model Features

Mixed and randomized training
Trained simultaneously on C4 and HugeNews datasets, employing random sampling of gap sentence ratios and adding noise to important sentences.
Multi-dataset adaptation
Supports various summarization datasets including xsum, cnn_dailymail, and newsroom.
Tokenizer optimization
Updated the SentencePiece tokenizer to support encoding line breaks, enhancing paragraph segmentation processing capability.

Model Capabilities

Text summarization generation
Multilingual support
Abstractive summarization

Use Cases

News summarization
News article summarization
Automatic summarization generation for news articles
Achieved ROUGE scores of 44.16/21.56/41.30 on the cnn_dailymail dataset
Academic paper summarization
Academic paper summarization
Summarization generation for academic papers such as those from arXiv and PubMed
Achieved ROUGE scores of 44.21/16.95/25.67 on the arxiv dataset
Legal document summarization
Legal document summarization
Summarization generation for legal documents like BigPatent
Achieved ROUGE scores of 52.29/33.08/41.66 on the big_patent dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase