🚀 用于金融摘要的PEGASUS模型
该模型是在一个全新的金融新闻数据集上进行微调的,该数据集包含来自 彭博社 的2000篇文章,主题涵盖股票、市场、货币、利率和加密货币等。它基于 PEGASUS 模型,特别是在极端摘要(XSum)数据集上微调的PEGASUS模型:[google/pegasus - xsum模型](https://huggingface.co/google/pegasus - xsum)。PEGASUS最初由Jingqing Zhang、Yao Zhao、Mohammad Saleh和Peter J. Liu在 PEGASUS: Pre - training with Extracted Gap - sentences for Abstractive Summarization 中提出。
注意:此模型为基础版本。若您需要性能显著提升的更高级模型,请查看我们在Rapid API上的 [高级版本](https://rapidapi.com/medoid - ai - medoid - ai - default/api/financial - summarization - advanced)。与基础模型相比,高级模型的ROUGE分数(与人工生成摘要的相似度)提高了超过16%。此外,我们的高级模型还提供了多种适合不同用例和工作负载的便捷方案,确保个人和企业用户都能获得无缝体验。
🚀 快速开始
我们提供了一个简单的代码片段,展示如何在PyTorch中使用该模型进行金融摘要任务。
基础用法
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
output = model.generate(
input_ids,
max_length=32,
num_beams=5,
early_stopping=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
📚 详细文档
评估结果
在我们的数据集上进行微调前后的结果如下:
是否微调 |
ROUGE - 1 |
ROUGE - 2 |
ROUGE - L |
ROUGE - LSUM |
是 |
23.55 |
6.99 |
18.14 |
21.36 |
否 |
13.8 |
2.4 |
10.63 |
12.03 |
引用信息
您可以在以下研讨会论文中找到有关这项工作的更多详细信息。如果您在研究中使用了我们的模型,请考虑引用我们的论文:
T. Passali, A. Gidiotis, E. Chatzikyriakidis和G. Tsoumakas. 2021.
Towards Human - Centered Summarization: A Case Study on Financial News.
In Proceedings of the First Workshop on Bridging Human - Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.
BibTeX引用格式:
@inproceedings{passali-etal-2021-towards,
title = "Towards Human-Centered Summarization: A Case Study on Financial News",
author = "Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios",
booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
month = apr,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
pages = "21--27",
}
支持信息
如果您对在更多文章上训练并适合您需求的更复杂版本的模型感兴趣,请通过 info@medoid.ai 与我们联系!
关于Medoid AI的更多信息: