Pegasus X Sumstew
P
Pegasus X Sumstew
由 Joemgu 开发
基于Pegasus-x-large微调的英语长文本摘要模型,支持学术文稿、会议记录等复杂文本的抽象摘要生成
下载量 31
发布时间 : 5/2/2023
模型简介
该模型在CNN每日邮报、Samsum、Booksum和Laysum混合数据集的过滤子集上微调,专门用于生成长文本的抽象摘要
模型特点
长文本处理能力
专门针对1000词以上的长文本优化,能有效处理复杂内容结构
多领域适应性
在新闻、对话、书籍和学术论文等多种文本类型上训练,具有领域泛化能力
抽象摘要生成
不仅能提取关键句,还能生成包含语义重构的抽象摘要
模型能力
长文本摘要生成
多领域文本理解
语义重构
使用案例
学术研究
科研论文摘要
为长篇学术论文生成易于理解的摘要
帮助非专业读者快速掌握论文核心内容
文学创作
书籍章节摘要
为文学作品生成情节摘要
保留原著风格的关键情节概述
商业文档
会议记录摘要
从冗长的会议记录中提取决策要点
生成包含关键决策和行动项的简明摘要
🚀 Pegasus-x-sumstew
Pegasus-x-sumstew是基于Pegasus-x-large模型微调得到的,可对长文本进行抽象概括,生成简洁且准确的摘要。
🚀 快速开始
你可以使用transformers
库中的pipeline
函数来使用此模型:
from transformers import pipeline
summarizer = pipeline("summarization", "joemgu/pegasus-x-sumstew")
text = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?' So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again."
summary = summarizer(text,
num_beams=8,
repetition_penalty=3.5,
no_repeat_ngram_size=4,
encoder_no_repeat_ngram_size=4
)[0]["summary_text"]
print(summary)
输出:
Alice is a bored and curious girl who follows a White Rabbit with a watch into a rabbit-hole. She enters a strange world where she has many adventures and meets many peculiar creatures.
✨ 主要特性
- 可对英文长文本进行抽象概括,生成摘要。
- 适用于学术记录、会议纪要、文学作品等长文本的摘要生成。
📦 安装指南
文档未提及安装步骤,故跳过此章节。
💻 使用示例
基础用法
from transformers import pipeline
summarizer = pipeline("summarization", "joemgu/pegasus-x-sumstew")
text = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?' So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again."
summary = summarizer(text,
num_beams=8,
repetition_penalty=3.5,
no_repeat_ngram_size=4,
encoder_no_repeat_ngram_size=4
)[0]["summary_text"]
print(summary)
高级用法
文档未提及高级用法代码示例,故跳过此部分。
📚 详细文档
模型描述
此模型是Pegasus-x-large模型在CNN-Dailymail、Samsum、Booksum和Laysum数据集的过滤子集上进行微调得到的版本。它可以生成长文本的抽象摘要。
预期用途与限制
该模型可用于对英文长文本进行摘要生成,例如学术记录、会议纪要或文学作品。但它不适用于对短文本(如推文、标题或说明文字)进行摘要。如果输入文本包含事实错误、俚语或冒犯性语言,模型可能会生成不准确或有偏差的摘要。
训练数据
该模型在CNN-Dailymail、Samsum、Booksum和Laysum数据集的过滤子集上进行了微调。这些数据集包含各种类型的文本及其抽象摘要。筛选出的子集仅包含长度超过1000个单词且摘要长度少于100个单词的文本。该子集的总大小约为15万个示例。
局限性和偏差
该模型可能继承了预训练的Pegasus-x-large模型和微调数据集的一些局限性和偏差。一些可能的偏差来源包括:
- 预训练的Pegasus-x-large模型是在来自各种来源的大量英文文本语料库上进行训练的,这可能无法反映不同语言和文化的多样性和细微差别。
- 微调数据集是从不同的领域和体裁收集的,它们可能有自己的文体惯例以及对某些主题和事件的观点。
- 微调数据集仅包含抽象摘要,这可能无法捕捉原始文本的所有重要信息和细微差别。
- 微调数据集仅涵盖了特定时间段和来源的文本,这可能无法反映当前的情况和趋势。
因此,用户在使用此模型时应意识到这些局限性和偏差,并评估其性能和对特定用例的适用性。
🔧 技术细节
文档未提供具体技术细节(说明内容少于50字),故跳过此章节。
📄 许可证
该模型使用Apache-2.0许可证。
Bart Large Cnn
MIT
基于英语语料预训练的BART模型,专门针对CNN每日邮报数据集进行微调,适用于文本摘要任务
文本生成 英语
B
facebook
3.8M
1,364
Parrot Paraphraser On T5
Parrot是一个基于T5的释义框架,专为加速训练自然语言理解(NLU)模型而设计,通过生成高质量释义实现数据增强。
文本生成
Transformers

P
prithivida
910.07k
152
Distilbart Cnn 12 6
Apache-2.0
DistilBART是BART模型的蒸馏版本,专门针对文本摘要任务进行了优化,在保持较高性能的同时显著提升了推理速度。
文本生成 英语
D
sshleifer
783.96k
278
T5 Base Summarization Claim Extractor
基于T5架构的模型,专门用于从摘要文本中提取原子声明,是摘要事实性评估流程的关键组件。
文本生成
Transformers 英语

T
Babelscape
666.36k
9
Unieval Sum
UniEval是一个统一的多维评估器,用于自然语言生成任务的自动评估,支持多个可解释维度的评估。
文本生成
Transformers

U
MingZhong
318.08k
3
Pegasus Paraphrase
Apache-2.0
基于PEGASUS架构微调的文本复述模型,能够生成语义相同但表达不同的句子。
文本生成
Transformers 英语

P
tuner007
209.03k
185
T5 Base Korean Summarization
这是一个基于T5架构的韩语文本摘要模型,专为韩语文本摘要任务设计,通过微调paust/pko-t5-base模型在多个韩语数据集上训练而成。
文本生成
Transformers 韩语

T
eenzeenee
148.32k
25
Pegasus Xsum
PEGASUS是一种基于Transformer的预训练模型,专门用于抽象文本摘要任务。
文本生成 英语
P
google
144.72k
198
Bart Large Cnn Samsum
MIT
基于BART-large架构的对话摘要模型,专为SAMSum语料库微调,适用于生成对话摘要。
文本生成
Transformers 英语

B
philschmid
141.28k
258
Kobart Summarization
MIT
基于KoBART架构的韩语文本摘要模型,能够生成韩语新闻文章的简洁摘要。
文本生成
Transformers 韩语

K
gogamza
119.18k
12
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers 支持多种语言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers 英语

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统 中文
R
uer
2,694
98