financial-summarization-pegasus开源模型 - 免费生成精准金融新闻摘要

首页

Financial Summarization Pegasus

由 human-centered-summarization 开发

基于彭博社2000篇金融新闻微调的PEGASUS模型，专为金融领域摘要生成优化

文本生成

Transformers

英语#金融新闻摘要 #高ROUGE分数 #彭博数据微调

下载量 11.89k

发布时间 : 3/2/2022

模型简介

该模型针对金融新闻（股票、市场、货币等）进行摘要生成，采用PEGASUS架构并在XSum数据集微调版本基础上训练

模型特点

金融领域专业化

针对股票、市场、货币等金融新闻内容优化，理解专业术语和数字表达

高质量摘要生成

经微调后ROUGE-1分数提升70%（从13.8到23.55），显著优于基础模型

商业级进阶版本

提供性能提升16%以上的高级版本，支持企业级定制需求

模型能力

金融文本理解

自动摘要生成

关键信息提取

多主题覆盖（股票/货币/加密货币等）

使用案例

金融信息服务

实时新闻摘要

为金融资讯平台自动生成新闻要点

输出示例：沙特银行以3.5%溢价收购桑巴股份，新银行资产达2200亿美元成海湾第三大行

投资决策支持

快速提取并购交易关键参数（估值/溢价率/股权置换比例等）

准确捕捉150亿美元收购案中的3.5%溢价和24%消息前溢价

企业情报

竞品监控报告

自动生成金融机构合并动态简报

识别海湾地区银行排名变化（合并后第三大银行）

🚀 用于金融摘要的PEGASUS模型

该模型是在一个全新的金融新闻数据集上进行微调的，该数据集包含来自彭博社的2000篇文章，主题涵盖股票、市场、货币、利率和加密货币等。它基于 PEGASUS 模型，特别是在极端摘要（XSum）数据集上微调的PEGASUS模型：[google/pegasus - xsum模型](https://huggingface.co/google/pegasus - xsum)。PEGASUS最初由Jingqing Zhang、Yao Zhao、Mohammad Saleh和Peter J. Liu在 PEGASUS: Pre - training with Extracted Gap - sentences for Abstractive Summarization 中提出。

注意：此模型为基础版本。若您需要性能显著提升的更高级模型，请查看我们在Rapid API上的 [高级版本](https://rapidapi.com/medoid - ai - medoid - ai - default/api/financial - summarization - advanced)。与基础模型相比，高级模型的ROUGE分数（与人工生成摘要的相似度）提高了超过16%。此外，我们的高级模型还提供了多种适合不同用例和工作负载的便捷方案，确保个人和企业用户都能获得无缝体验。

🚀 快速开始

我们提供了一个简单的代码片段，展示如何在PyTorch中使用该模型进行金融摘要任务。

基础用法

from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration

# 加载模型和分词器 
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name) # 若要使用Tensorflow模型 
                                                                    # 只需替换为TFPegasusForConditionalGeneration

# 待摘要的文本
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."

# 对文本进行分词
# 若要在Tensorflow中运行代码，请记得使用return_tensors = 'tf'返回特定张量
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids

# 生成输出（这里使用了束搜索，您也可以使用其他策略）
output = model.generate(
    input_ids, 
    max_length=32, 
    num_beams=5, 
    early_stopping=True
)

# 最后，打印生成的摘要
print(tokenizer.decode(output[0], skip_special_tokens=True))
# 生成的输出: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion

📚 详细文档

评估结果

在我们的数据集上进行微调前后的结果如下：

是否微调	ROUGE - 1	ROUGE - 2	ROUGE - L	ROUGE - LSUM
是	23.55	6.99	18.14	21.36
否	13.8	2.4	10.63	12.03

引用信息

您可以在以下研讨会论文中找到有关这项工作的更多详细信息。如果您在研究中使用了我们的模型，请考虑引用我们的论文：

T. Passali, A. Gidiotis, E. Chatzikyriakidis和G. Tsoumakas. 2021. Towards Human - Centered Summarization: A Case Study on Financial News. In Proceedings of the First Workshop on Bridging Human - Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.

BibTeX引用格式：

@inproceedings{passali-etal-2021-towards,
    title = "Towards Human-Centered Summarization: A Case Study on Financial News",
    author = "Passali, Tatiana  and Gidiotis, Alexios  and Chatzikyriakidis, Efstathios  and Tsoumakas, Grigorios",
    booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
    pages = "21--27",
}