legal-pegasus開源法律文本摘要模型 - 免費部署生成法律文檔抽象摘要

首頁

Legal Pegasus

由nsi319開發

基於PEGASUS微調的法律領域文本摘要模型，專門用於生成法律文檔的抽象摘要

文本生成

Transformers

英語開源協議:MIT #法律文檔摘要 #抽象摘要生成 #SEC訴訟公告

下載量 2,658

發布時間 : 3/2/2022

模型概述

該模型是基於google/pegasus-cnn_dailymail微調的法律領域版本，專注於法律文檔的抽象摘要生成任務，支持1024個標記的輸入序列長度。

模型特點

法律領域優化

專門針對法律文檔進行微調，能夠更好地理解和摘要法律相關內容

長文本支持

支持最大1024個標記的輸入序列長度，適合處理較長的法律文檔

高質量摘要

相比基礎模型，在法律文檔摘要任務上ROUGE指標有顯著提升

模型能力

法律文本理解

抽象摘要生成

長文本處理

使用案例

法律文檔處理

訴訟公告摘要

自動生成訴訟公告的簡潔摘要

ROUGE-1得分57.39，顯著優於通用摘要模型

起訴書摘要

從複雜起訴書中提取關鍵信息生成摘要

ROUGE-L得分30.91，法律術語保留準確

🚀 用於法律文檔摘要的PEGASUS

legal-pegasus 是 google/pegasus-cnn_dailymail 在 法律領域 的微調版本，經過訓練可執行 抽象摘要 任務。輸入序列的最大長度為 1024 個標記。

🚀 快速開始

安裝依賴

本項目依賴 transformers 庫，你可以使用以下命令進行安裝：

pip install transformers

代碼示例

以下是使用該模型進行法律文檔摘要的代碼示例：

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("nsi319/legal-pegasus")  
model = AutoModelForSeq2SeqLM.from_pretrained("nsi319/legal-pegasus")


text = """On March 5, 2021, the Securities and Exchange Commission charged AT&T, Inc. with repeatedly violating Regulation FD, and three of its Investor Relations executives with aiding and abetting AT&T's violations, by selectively disclosing material nonpublic information to research analysts. According to the SEC's complaint, AT&T learned in March 2016 that a steeper-than-expected decline in its first quarter smartphone sales would cause AT&T's revenue to fall short of analysts' estimates for the quarter. The complaint alleges that to avoid falling short of the consensus revenue estimate for the third consecutive quarter, AT&T Investor Relations executives Christopher Womack, Michael Black, and Kent Evans made private, one-on-one phone calls to analysts at approximately 20 separate firms. On these calls, the AT&T executives allegedly disclosed AT&T's internal smartphone sales data and the impact of that data on internal revenue metrics, despite the fact that internal documents specifically informed Investor Relations personnel that AT&T's revenue and sales of smartphones were types of information generally considered "material" to AT&T investors, and therefore prohibited from selective disclosure under Regulation FD. The complaint further alleges that as a result of what they were told on these calls, the analysts substantially reduced their revenue forecasts, leading to the overall consensus revenue estimate falling to just below the level that AT&T ultimately reported to the public on April 26, 2016. The SEC's complaint, filed in federal district court in Manhattan, charges AT&T with violations of the disclosure provisions of Section 13(a) of the Securities Exchange Act of 1934 and Regulation FD thereunder, and charges Womack, Evans and Black with aiding and abetting these violations. The complaint seeks permanent injunctive relief and civil monetary penalties against each defendant. The SEC's investigation was conducted by George N. Stepaniuk, Thomas Peirce, and David Zetlin-Jones of the SEC's New York Regional Office. The SEC's litigation will be conducted by Alexander M. Vasilescu, Victor Suthammanont, and Mr. Zetlin-Jones. The case is being supervised by Sanjay Wadhwa."""

input_tokenized = tokenizer.encode(text, return_tensors='pt',max_length=1024,truncation=True)
summary_ids = model.generate(input_tokenized,
                                  num_beams=9,
                                  no_repeat_ngram_size=3,
                                  length_penalty=2.0,
                                  min_length=150,
                                  max_length=250,
                                  early_stopping=True)
summary = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids][0]
### Summary Output

# The Securities and Exchange Commission today charged AT&T, Inc. and three of its Investor Relations executives with aiding and abetting the company's violations of the antifraud provisions of Section 10(b) of the Securities Exchange Act of 1934 and Rule 10b-5 thereunder. According to the SEC's complaint, the company learned in March 2016 that a steeper-than-expected decline in its first quarter smartphone sales would cause its revenue to fall short of analysts' estimates for the quarter. The complaint alleges that to avoid falling short of the consensus revenue estimate for the third consecutive quarter, the executives made private, one-on-one phone calls to analysts at approximately 20 separate firms. On these calls, the SEC alleges that Christopher Womack, Michael Black, and Kent Evans allegedly disclosed internal smartphone sales data and the impact of that data on internal revenue metrics. The SEC further alleges that as a result of what they were told, the analysts substantially reduced their revenue forecasts, leading to the overall consensus Revenue Estimate falling to just below the level that AT&t ultimately reported to the public on April 26, 2016. The SEC is seeking permanent injunctive relief and civil monetary penalties against each defendant.