BART_no_extraction_V2 Open-source Legal Document Summarization Model - Multi-stage Processing for Long Text Summarization

BART No Extraction V2

Developed by MikaSie

A BART fine-tuned model for long legal document summarization, employing multi-stage summarization approach for complex legal texts

Text Generation

Transformers

English#Legal document summarization #Multi-stage summarization #Abstractive summarization

Downloads 280

Release Time : 5/24/2024

Model Overview

This model is a fine-tuned version of BART specifically designed for long legal document summarization. The research adopts a multi-stage approach, first compressing source text through extractive summarization, then generating the final summary with an abstractive model.

Model Features

Multi-stage summarization

Combines extractive and abstractive summarization methods, first compressing source text then generating final summary

Legal document optimization

Specifically optimized for lengthy and complex legal documents

High performance metrics

Outstanding performance on EUR-lex-sum dataset with ROUGE-1 score reaching 0.459

Model Capabilities

Legal text summarization

Multi-stage text processing

Long document understanding

Use Cases

Legal document processing

EU legal document summarization

Generating concise summaries for EU legal documents

ROUGE-1 score 0.459, BERTScore 0.870

Renewable energy regulation analysis

Analyzing complex new energy regulation texts

🚀 BART_no_extraction_V2 Model

This model is a fine - tuned BART model designed for long, legal document summarization. It uses a multi - step approach to compress and summarize complex legal texts, providing more efficient information extraction.

🚀 Quick Start

Prerequisites

pip install bert-extractive-summarizer

Example Code

from summarizer import Summarizer
from transformers import pipeline

extractive_model = Summarizer()

text = 'Original document text to be summarized'

extractive_summary = Summarizer(text)

abstractive_model = pipeline('summarization', model = 'MikaSie/BART_no_extraction_V2', tokenizer = 'MikaSie/BART_no_extraction_V2')

result = pipeline(extractive_summary)

✨ Features

Multi - step Summarization: First uses extractive summarization to compress text, then abstractive summarization to generate the final summary.
Trained on Legal Data: Trained on the [EUR - lex - sum](https://huggingface.co/datasets/dennlinger/eur - lex - sum) dataset, suitable for long, legal document summarization.

📦 Model Details

Model Information

Property	Details
Model Type	Fine - tuned BART
Training Data	[EUR - lex - sum](https://huggingface.co/datasets/dennlinger/eur - lex - sum)
Language	English
Finetuned from model	BART

Model Performance

Metric	Value
ROUGE - 1	0.45896475125147945
ROUGE - 2	0.19536601211197768
ROUGE - L	0.21743714120418117
BERTScore	0.8702187434921052
BARTScore	- 3.4154099993527196
BLANC	0.10292031328498923

Model Sources

Repository: https://github.com/MikaSie/Thesis
Paper: PAPER_LINK
Streamlit demo: STREAMLIT_LINK

💻 Usage Examples

Basic Usage

from summarizer import Summarizer
from transformers import pipeline

extractive_model = Summarizer()
text = 'Original document text to be summarized'
extractive_summary = Summarizer(text)
abstractive_model = pipeline('summarization', model = 'MikaSie/BART_no_extraction_V2', tokenizer = 'MikaSie/BART_no_extraction_V2')
result = pipeline(extractive_summary)

Advanced Usage

To obtain optimal results, it is recommended to use an extractive summarization tool to pre - process the input text. You can try different extractive - abstractive model combinations, which can be found on https://huggingface.co/MikaSie.

📚 Documentation

Model Description

This model is a fine - tuned version of BART. The research involves a multi - step summarization approach to long, legal documents. Many decisions in the renewables energy space are heavily dependent on regulations. But these regulations are often long and complicated. The proposed architecture first uses one or more extractive summarization steps to compress the source text, before the final summary is created by the abstractive summarization model. This fine - tuned abstractive model has been trained on a dataset, pre - processed through extractive summarization by No extractive model with No ratio ratio. The research has used multiple extractive - abstractive model combinations, which can be found on https://huggingface.co/MikaSie. To obtain optimal results, feed the model an extractive summary as input as it was designed this way!

Out - of - Scope Use

Using this model without an extractive summarization step may not yield optimal results. It is recommended to follow the proposed multi - step summarization approach outlined in the model description for best performance.

🔧 Technical Details

The model uses a multi - step approach. First, it applies extractive summarization to compress the long legal text, reducing its length and complexity. Then, the abstractive summarization model generates the final summary. The fine - tuned abstractive model has been trained on a pre - processed dataset to ensure better performance on long, legal documents.

📄 License

No license information provided in the original document.

⚠️ Important Note

As with any language model, this model may inherit biases present in the training data. It is important to be aware of potential biases in the source text and to critically evaluate the generated summaries.

💡 Usage Tip

Carefully review and validate the generated summaries before relying on them for critical tasks.

Consider using the model in conjunction with human review or other validation mechanisms to ensure the accuracy and completeness of the summaries.

Experiment with different extractive summarization models or techniques to find the most suitable input for the abstractive model.

Provide feedback and contribute to the ongoing research and development of the model to help improve its performance and address its limitations.

Any actions taken based on this content are at your own risk.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご