đ PEGASUS for legal document summarization
legal-pegasus is a fine - tuned version of ([google/pegasus - cnn_dailymail](https://huggingface.co/google/pegasus - cnn_dailymail)) for the legal domain. It is trained to perform the abstractive summarization task. The maximum length of the input sequence is 1024 tokens.
đ Quick Start
This model is designed for abstractive summarization of legal documents. It can help users quickly extract key information from long legal texts.
⨠Features
- Domain - Specific: Fine - tuned for the legal domain, making it more suitable for summarizing legal documents.
- Abstractive Summarization: Capable of generating abstractive summaries, which can better capture the essence of the text.
đĻ Installation
To use this model, you need to install the transformers
library. You can install it using the following command:
pip install transformers
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("nsi319/legal-pegasus")
model = AutoModelForSeq2SeqLM.from_pretrained("nsi319/legal-pegasus")
text = """On March 5, 2021, the Securities and Exchange Commission charged AT&T, Inc. with repeatedly violating Regulation FD, and three of its Investor Relations executives with aiding and abetting AT&T's violations, by selectively disclosing material nonpublic information to research analysts. According to the SEC's complaint, AT&T learned in March 2016 that a steeper-than-expected decline in its first quarter smartphone sales would cause AT&T's revenue to fall short of analysts' estimates for the quarter. The complaint alleges that to avoid falling short of the consensus revenue estimate for the third consecutive quarter, AT&T Investor Relations executives Christopher Womack, Michael Black, and Kent Evans made private, one-on-one phone calls to analysts at approximately 20 separate firms. On these calls, the AT&T executives allegedly disclosed AT&T's internal smartphone sales data and the impact of that data on internal revenue metrics, despite the fact that internal documents specifically informed Investor Relations personnel that AT&T's revenue and sales of smartphones were types of information generally considered "material" to AT&T investors, and therefore prohibited from selective disclosure under Regulation FD. The complaint further alleges that as a result of what they were told on these calls, the analysts substantially reduced their revenue forecasts, leading to the overall consensus revenue estimate falling to just below the level that AT&T ultimately reported to the public on April 26, 2016. The SEC's complaint, filed in federal district court in Manhattan, charges AT&T with violations of the disclosure provisions of Section 13(a) of the Securities Exchange Act of 1934 and Regulation FD thereunder, and charges Womack, Evans and Black with aiding and abetting these violations. The complaint seeks permanent injunctive relief and civil monetary penalties against each defendant. The SEC's investigation was conducted by George N. Stepaniuk, Thomas Peirce, and David Zetlin-Jones of the SEC's New York Regional Office. The SEC's litigation will be conducted by Alexander M. Vasilescu, Victor Suthammanont, and Mr. Zetlin-Jones. The case is being supervised by Sanjay Wadhwa."""
input_tokenized = tokenizer.encode(text, return_tensors='pt',max_length=1024,truncation=True)
summary_ids = model.generate(input_tokenized,
num_beams=9,
no_repeat_ngram_size=3,
length_penalty=2.0,
min_length=150,
max_length=250,
early_stopping=True)
summary = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids][0]
đ Documentation
Training data
This model was trained on sec - litigation - releases dataset consisting of more than 2700 litigation releases and complaints.
Evaluation results
Property |
Details |
Model Type |
legal - pegasus |
Training Data |
sec - litigation - releases dataset with over 2700 litigation releases and complaints |
Model |
rouge1 |
rouge1 - precision |
rouge2 |
rouge2 - precision |
rougeL |
rougeL - precision |
legal - pegasus |
57.39 |
62.97 |
26.85 |
28.42 |
30.91 |
33.22 |
pegasus - cnn_dailymail |
43.16 |
45.68 |
13.75 |
14.56 |
18.82 |
20.07 |
đ License
This project is licensed under the MIT License.