🚀 用於金融摘要的PEGASUS模型
該模型是在一個全新的金融新聞數據集上進行微調的,該數據集包含來自 彭博社 的2000篇文章,主題涵蓋股票、市場、貨幣、利率和加密貨幣等。它基於 PEGASUS 模型,特別是在極端摘要(XSum)數據集上微調的PEGASUS模型:[google/pegasus - xsum模型](https://huggingface.co/google/pegasus - xsum)。PEGASUS最初由Jingqing Zhang、Yao Zhao、Mohammad Saleh和Peter J. Liu在 PEGASUS: Pre - training with Extracted Gap - sentences for Abstractive Summarization 中提出。
注意:此模型為基礎版本。若您需要性能顯著提升的更高級模型,請查看我們在Rapid API上的 [高級版本](https://rapidapi.com/medoid - ai - medoid - ai - default/api/financial - summarization - advanced)。與基礎模型相比,高級模型的ROUGE分數(與人工生成摘要的相似度)提高了超過16%。此外,我們的高級模型還提供了多種適合不同用例和工作負載的便捷方案,確保個人和企業用戶都能獲得無縫體驗。
🚀 快速開始
我們提供了一個簡單的代碼片段,展示如何在PyTorch中使用該模型進行金融摘要任務。
基礎用法
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
output = model.generate(
input_ids,
max_length=32,
num_beams=5,
early_stopping=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
📚 詳細文檔
評估結果
在我們的數據集上進行微調前後的結果如下:
是否微調 |
ROUGE - 1 |
ROUGE - 2 |
ROUGE - L |
ROUGE - LSUM |
是 |
23.55 |
6.99 |
18.14 |
21.36 |
否 |
13.8 |
2.4 |
10.63 |
12.03 |
引用信息
您可以在以下研討會論文中找到有關這項工作的更多詳細信息。如果您在研究中使用了我們的模型,請考慮引用我們的論文:
T. Passali, A. Gidiotis, E. Chatzikyriakidis和G. Tsoumakas. 2021.
Towards Human - Centered Summarization: A Case Study on Financial News.
In Proceedings of the First Workshop on Bridging Human - Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.
BibTeX引用格式:
@inproceedings{passali-etal-2021-towards,
title = "Towards Human-Centered Summarization: A Case Study on Financial News",
author = "Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios",
booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
month = apr,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
pages = "21--27",
}
支持信息
如果您對在更多文章上訓練並適合您需求的更復雜版本的模型感興趣,請通過 info@medoid.ai 與我們聯繫!
關於Medoid AI的更多信息: