flan-t5-11b-summarizer-filtered Open-source Summarizer Generator - Suitable for Academic and General Scenarios

Flan T5 11b Summarizer Filtered

Developed by jordiclive

This is a version fine-tuned on multiple summarization datasets based on the google/flan-t5-xxl model, aiming to create a multi-functional summarizer suitable for both academic and general scenarios.

Text Generation

Transformers

EnglishOpen Source License:Bsd-3-clause #Multi-instruction Summarization #Adjustable-length Summarization #Multi-dataset Fine-tuning

Downloads 492

Release Time : 2/7/2023

Model Overview

By fine-tuning on multiple summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, etc.), this model can generate different types of summaries based on various prompts. 70% of the training data was filtered using contriever to improve quality.

Model Features

Multi-functional Summarization

By changing the prompts, different types of summaries can be generated (e.g., social media summaries, ten-word summaries, five-word summaries, etc.).

Multi-dataset Training

Fine-tuned on multiple high-quality summarization datasets, enhancing the model's generalization capability.

Data Filtering

70% of the training data was filtered using contriever to improve the quality of the training data.

Model Capabilities

Text Summarization

Length-controllable Summary Generation

Multi-type Summary Generation

Use Cases

Content Summarization

News Summarization

Generate concise summaries for news articles

Can produce news summaries of varying lengths as required

Academic Paper Summarization

Generate technical summaries for academic papers

Capable of capturing key information and generating professional summaries

Social Media Content Summarization

Generate brief summaries for social media posts

Can create short summaries suitable for social media dissemination

🚀 Multi-purpose Summarizer (Fine-tuned 11B google/flan-t5-xxl on several Summarization datasets)

This is a fine - tuned model based on google/flan - t5 - xxl, aiming to serve as a general - purpose summarizer for academic and general use.

This is a fine - tuned version of [google/flan - t5 - xxl](https://huggingface.co/google/flan - t5 - xxl) on various summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR, wikipedia - summary). 70% of the data was also filtered using the contriever with a cosine similarity between text and summary of 0.6 as the threshold.

The goal is to create a model that can be used as a general - purpose summarizer for academic and general usage. You can control the type of summary by varying the instruction prepended to the source document. Although it was trained with a max source length of 512 tokens and a max summary length of 150, the result works well on lots of text.

🚀 Quick Start

Check the colab notebook for desired usage.

✨ Features

General - Purpose Summarization: Can be used for various types of text summarization, suitable for both academic and general scenarios.
Prompt - Controlled Summary: The type of summary can be controlled by prepending different prompts to the source document.
Data Filtering: 70% of the training data was filtered using the contriever with a cosine similarity threshold.

📦 Installation

After pip install transformers, you can start using the model.

💻 Usage Examples

Basic Usage

The model expects a prompt prepended to the source document to indicate the type of summary. This model was trained with a large (100s) variety of prompts:

example_prompts = {
    "social": "Produce a short summary of the following social media post:",
    "ten": "Summarize the following article in 10 - 20 words:",
    "5": "Summarize the following article in 0 - 5 words:",
    "100": "Summarize the following article in about 100 words:",
    "summary": "Write a ~ 100 word summary of the following text:",
    "short": "Provide a short summary of the following article:",
}

The model has also learned to handle the length of the summary specified in words by a range "x - y words" or e.g. "~/approximately/about/ x words." Prompts should be formatted with a colon at the end so that the input to the model is formatted as e.g. "Summarize the following: \n\n {input_text}"

Advanced Usage

After installation, run the following code:

from transformers import pipeline

summarizer = pipeline("summarization", "jordiclive/flan-t5-11b-summarizer-filtered", torch_dtype=torch.bfloat16)

raw_document = 'You must be 18 years old to live or work in New York State...'
prompt = "Summarize the following article in 10 - 20 words:"
results = summarizer(
        f"{prompt} \n\n {raw_document}",
        num_beams=5,
        min_length=5,
        no_repeat_ngram_size=3,
        truncation=True,
        max_length=512,
    )

Note that this pipeline will run slower and not have some of the tokenization parameters as the colab.

📚 Documentation

Training procedure

Training was done in BF16, deepspeed stage 2 with CPU offload for 1 epoch with val loss monitored.

Hardware

GPU count: 8 NVIDIA A100 - SXM4 - 80GB
CPU count: 48

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	3e - 05
train_batch_size	4
eval_batch_size	4
seed	42
distributed_type	multi - GPU
gradient_accumulation_steps	2
effective_train_batch_size	64
optimizer	Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type	linear
warmup_steps	2000
num_epochs	4

Framework versions

Property	Details
Transformers	4.24.0
Pytorch	1.9.1+cu111
Deepspeed	0.7.4
Pytorch - lightning	1.8.1

📄 License

This project is licensed under the Apache - 2.0 and BSD - 3 - Clause licenses.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご