Pegasus X Sumstew
P
Pegasus X Sumstew
由Joemgu開發
基於Pegasus-x-large微調的英語長文本摘要模型,支持學術文稿、會議記錄等複雜文本的抽象摘要生成
下載量 31
發布時間 : 5/2/2023
模型概述
該模型在CNN每日郵報、Samsum、Booksum和Laysum混合數據集的過濾子集上微調,專門用於生成長文本的抽象摘要
模型特點
長文本處理能力
專門針對1000詞以上的長文本優化,能有效處理複雜內容結構
多領域適應性
在新聞、對話、書籍和學術論文等多種文本類型上訓練,具有領域泛化能力
抽象摘要生成
不僅能提取關鍵句,還能生成包含語義重構的抽象摘要
模型能力
長文本摘要生成
多領域文本理解
語義重構
使用案例
學術研究
科研論文摘要
為長篇學術論文生成易於理解的摘要
幫助非專業讀者快速掌握論文核心內容
文學創作
書籍章節摘要
為文學作品生成情節摘要
保留原著風格的關鍵情節概述
商業文檔
會議記錄摘要
從冗長的會議記錄中提取決策要點
生成包含關鍵決策和行動項的簡明摘要
🚀 Pegasus-x-sumstew
Pegasus-x-sumstew是基於Pegasus-x-large模型微調得到的,可對長文本進行抽象概括,生成簡潔且準確的摘要。
🚀 快速開始
你可以使用transformers
庫中的pipeline
函數來使用此模型:
from transformers import pipeline
summarizer = pipeline("summarization", "joemgu/pegasus-x-sumstew")
text = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?' So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again."
summary = summarizer(text,
num_beams=8,
repetition_penalty=3.5,
no_repeat_ngram_size=4,
encoder_no_repeat_ngram_size=4
)[0]["summary_text"]
print(summary)
輸出:
Alice is a bored and curious girl who follows a White Rabbit with a watch into a rabbit-hole. She enters a strange world where she has many adventures and meets many peculiar creatures.
✨ 主要特性
- 可對英文長文本進行抽象概括,生成摘要。
- 適用於學術記錄、會議紀要、文學作品等長文本的摘要生成。
📦 安裝指南
文檔未提及安裝步驟,故跳過此章節。
💻 使用示例
基礎用法
from transformers import pipeline
summarizer = pipeline("summarization", "joemgu/pegasus-x-sumstew")
text = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?' So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, 'Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. In another moment down went Alice after it, never once considering how in the world she was to get out again."
summary = summarizer(text,
num_beams=8,
repetition_penalty=3.5,
no_repeat_ngram_size=4,
encoder_no_repeat_ngram_size=4
)[0]["summary_text"]
print(summary)
高級用法
文檔未提及高級用法代碼示例,故跳過此部分。
📚 詳細文檔
模型描述
此模型是Pegasus-x-large模型在CNN-Dailymail、Samsum、Booksum和Laysum數據集的過濾子集上進行微調得到的版本。它可以生成長文本的抽象摘要。
預期用途與限制
該模型可用於對英文長文本進行摘要生成,例如學術記錄、會議紀要或文學作品。但它不適用於對短文本(如推文、標題或說明文字)進行摘要。如果輸入文本包含事實錯誤、俚語或冒犯性語言,模型可能會生成不準確或有偏差的摘要。
訓練數據
該模型在CNN-Dailymail、Samsum、Booksum和Laysum數據集的過濾子集上進行了微調。這些數據集包含各種類型的文本及其抽象摘要。篩選出的子集僅包含長度超過1000個單詞且摘要長度少於100個單詞的文本。該子集的總大小約為15萬個示例。
侷限性和偏差
該模型可能繼承了預訓練的Pegasus-x-large模型和微調數據集的一些侷限性和偏差。一些可能的偏差來源包括:
- 預訓練的Pegasus-x-large模型是在來自各種來源的大量英文文本語料庫上進行訓練的,這可能無法反映不同語言和文化的多樣性和細微差別。
- 微調數據集是從不同的領域和體裁收集的,它們可能有自己的文體慣例以及對某些主題和事件的觀點。
- 微調數據集僅包含抽象摘要,這可能無法捕捉原始文本的所有重要信息和細微差別。
- 微調數據集僅涵蓋了特定時間段和來源的文本,這可能無法反映當前的情況和趨勢。
因此,用戶在使用此模型時應意識到這些侷限性和偏差,並評估其性能和對特定用例的適用性。
🔧 技術細節
文檔未提供具體技術細節(說明內容少於50字),故跳過此章節。
📄 許可證
該模型使用Apache-2.0許可證。
Bart Large Cnn
MIT
基於英語語料預訓練的BART模型,專門針對CNN每日郵報數據集進行微調,適用於文本摘要任務
文本生成 英語
B
facebook
3.8M
1,364
Parrot Paraphraser On T5
Parrot是一個基於T5的釋義框架,專為加速訓練自然語言理解(NLU)模型而設計,通過生成高質量釋義實現數據增強。
文本生成
Transformers

P
prithivida
910.07k
152
Distilbart Cnn 12 6
Apache-2.0
DistilBART是BART模型的蒸餾版本,專門針對文本摘要任務進行了優化,在保持較高性能的同時顯著提升了推理速度。
文本生成 英語
D
sshleifer
783.96k
278
T5 Base Summarization Claim Extractor
基於T5架構的模型,專門用於從摘要文本中提取原子聲明,是摘要事實性評估流程的關鍵組件。
文本生成
Transformers 英語

T
Babelscape
666.36k
9
Unieval Sum
UniEval是一個統一的多維評估器,用於自然語言生成任務的自動評估,支持多個可解釋維度的評估。
文本生成
Transformers

U
MingZhong
318.08k
3
Pegasus Paraphrase
Apache-2.0
基於PEGASUS架構微調的文本複述模型,能夠生成語義相同但表達不同的句子。
文本生成
Transformers 英語

P
tuner007
209.03k
185
T5 Base Korean Summarization
這是一個基於T5架構的韓語文本摘要模型,專為韓語文本摘要任務設計,通過微調paust/pko-t5-base模型在多個韓語數據集上訓練而成。
文本生成
Transformers 韓語

T
eenzeenee
148.32k
25
Pegasus Xsum
PEGASUS是一種基於Transformer的預訓練模型,專門用於抽象文本摘要任務。
文本生成 英語
P
google
144.72k
198
Bart Large Cnn Samsum
MIT
基於BART-large架構的對話摘要模型,專為SAMSum語料庫微調,適用於生成對話摘要。
文本生成
Transformers 英語

B
philschmid
141.28k
258
Kobart Summarization
MIT
基於KoBART架構的韓語文本摘要模型,能夠生成韓語新聞文章的簡潔摘要。
文本生成
Transformers 韓語

K
gogamza
119.18k
12
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98