financial-summarization-pegasus開源模型 - 免費生成精準金融新聞摘要

首頁

Financial Summarization Pegasus

由human-centered-summarization開發

基於彭博社2000篇金融新聞微調的PEGASUS模型，專為金融領域摘要生成優化

文本生成

Transformers

英語#金融新聞摘要 #高ROUGE分數 #彭博數據微調

下載量 11.89k

發布時間 : 3/2/2022

模型概述

該模型針對金融新聞（股票、市場、貨幣等）進行摘要生成，採用PEGASUS架構並在XSum數據集微調版本基礎上訓練

模型特點

金融領域專業化

針對股票、市場、貨幣等金融新聞內容優化，理解專業術語和數字表達

高質量摘要生成

經微調後ROUGE-1分數提升70%（從13.8到23.55），顯著優於基礎模型

商業級進階版本

提供性能提升16%以上的高級版本，支持企業級定製需求

模型能力

金融文本理解

自動摘要生成

關鍵信息提取

多主題覆蓋（股票/貨幣/加密貨幣等）

使用案例

金融信息服務

即時新聞摘要

為金融資訊平臺自動生成新聞要點

輸出示例：沙特銀行以3.5%溢價收購桑巴股份，新銀行資產達2200億美元成海灣第三大行

投資決策支持

快速提取併購交易關鍵參數（估值/溢價率/股權置換比例等）

準確捕捉150億美元收購案中的3.5%溢價和24%消息前溢價

企業情報

競品監控報告

自動生成金融機構合併動態簡報

識別海灣地區銀行排名變化（合併後第三大銀行）

🚀 用於金融摘要的PEGASUS模型

該模型是在一個全新的金融新聞數據集上進行微調的，該數據集包含來自彭博社的2000篇文章，主題涵蓋股票、市場、貨幣、利率和加密貨幣等。它基於 PEGASUS 模型，特別是在極端摘要（XSum）數據集上微調的PEGASUS模型：[google/pegasus - xsum模型](https://huggingface.co/google/pegasus - xsum)。PEGASUS最初由Jingqing Zhang、Yao Zhao、Mohammad Saleh和Peter J. Liu在 PEGASUS: Pre - training with Extracted Gap - sentences for Abstractive Summarization 中提出。

注意：此模型為基礎版本。若您需要性能顯著提升的更高級模型，請查看我們在Rapid API上的 [高級版本](https://rapidapi.com/medoid - ai - medoid - ai - default/api/financial - summarization - advanced)。與基礎模型相比，高級模型的ROUGE分數（與人工生成摘要的相似度）提高了超過16%。此外，我們的高級模型還提供了多種適合不同用例和工作負載的便捷方案，確保個人和企業用戶都能獲得無縫體驗。

🚀 快速開始

我們提供了一個簡單的代碼片段，展示如何在PyTorch中使用該模型進行金融摘要任務。

基礎用法

from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration

# 加載模型和分詞器 
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name) # 若要使用Tensorflow模型 
                                                                    # 只需替換為TFPegasusForConditionalGeneration

# 待摘要的文本
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."

# 對文本進行分詞
# 若要在Tensorflow中運行代碼，請記得使用return_tensors = 'tf'返回特定張量
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids

# 生成輸出（這裡使用了束搜索，您也可以使用其他策略）
output = model.generate(
    input_ids, 
    max_length=32, 
    num_beams=5, 
    early_stopping=True
)

# 最後，打印生成的摘要
print(tokenizer.decode(output[0], skip_special_tokens=True))
# 生成的輸出: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion

📚 詳細文檔

評估結果

在我們的數據集上進行微調前後的結果如下：

是否微調	ROUGE - 1	ROUGE - 2	ROUGE - L	ROUGE - LSUM
是	23.55	6.99	18.14	21.36
否	13.8	2.4	10.63	12.03

引用信息

您可以在以下研討會論文中找到有關這項工作的更多詳細信息。如果您在研究中使用了我們的模型，請考慮引用我們的論文：

T. Passali, A. Gidiotis, E. Chatzikyriakidis和G. Tsoumakas. 2021. Towards Human - Centered Summarization: A Case Study on Financial News. In Proceedings of the First Workshop on Bridging Human - Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.

BibTeX引用格式：

@inproceedings{passali-etal-2021-towards,
    title = "Towards Human-Centered Summarization: A Case Study on Financial News",
    author = "Passali, Tatiana  and Gidiotis, Alexios  and Chatzikyriakidis, Efstathios  and Tsoumakas, Grigorios",
    booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
    pages = "21--27",
}