đ Multi-purpose Summarizer (Fine-tuned 11B google/flan-t5-xxl on several Summarization datasets)
This is a fine - tuned model based on google/flan - t5 - xxl, aiming to serve as a general - purpose summarizer for academic and general use.
This is a fine - tuned version of [google/flan - t5 - xxl](https://huggingface.co/google/flan - t5 - xxl) on various summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR, wikipedia - summary). 70% of the data was also filtered using the contriever with a cosine similarity between text and summary of 0.6 as the threshold.
The goal is to create a model that can be used as a general - purpose summarizer for academic and general usage. You can control the type of summary by varying the instruction prepended to the source document. Although it was trained with a max source length of 512 tokens and a max summary length of 150, the result works well on lots of text.
đ Quick Start
Check the colab notebook for desired usage.
⨠Features
- General - Purpose Summarization: Can be used for various types of text summarization, suitable for both academic and general scenarios.
- Prompt - Controlled Summary: The type of summary can be controlled by prepending different prompts to the source document.
- Data Filtering: 70% of the training data was filtered using the contriever with a cosine similarity threshold.
đĻ Installation
After pip install transformers
, you can start using the model.
đģ Usage Examples
Basic Usage
The model expects a prompt prepended to the source document to indicate the type of summary. This model was trained with a large (100s) variety of prompts:
example_prompts = {
"social": "Produce a short summary of the following social media post:",
"ten": "Summarize the following article in 10 - 20 words:",
"5": "Summarize the following article in 0 - 5 words:",
"100": "Summarize the following article in about 100 words:",
"summary": "Write a ~ 100 word summary of the following text:",
"short": "Provide a short summary of the following article:",
}
The model has also learned to handle the length of the summary specified in words by a range "x - y words" or e.g. "~/approximately/about/ x words." Prompts should be formatted with a colon at the end so that the input to the model is formatted as e.g. "Summarize the following: \n\n {input_text}"
Advanced Usage
After installation, run the following code:
from transformers import pipeline
summarizer = pipeline("summarization", "jordiclive/flan-t5-11b-summarizer-filtered", torch_dtype=torch.bfloat16)
raw_document = 'You must be 18 years old to live or work in New York State...'
prompt = "Summarize the following article in 10 - 20 words:"
results = summarizer(
f"{prompt} \n\n {raw_document}",
num_beams=5,
min_length=5,
no_repeat_ngram_size=3,
truncation=True,
max_length=512,
)
Note that this pipeline will run slower and not have some of the tokenization parameters as the colab.
đ Documentation
Training procedure
- Training was done in BF16, deepspeed stage 2 with CPU offload for 1 epoch with val loss monitored.
Hardware
- GPU count: 8 NVIDIA A100 - SXM4 - 80GB
- CPU count: 48
Training hyperparameters
The following hyperparameters were used during training:
Property |
Details |
learning_rate |
3e - 05 |
train_batch_size |
4 |
eval_batch_size |
4 |
seed |
42 |
distributed_type |
multi - GPU |
gradient_accumulation_steps |
2 |
effective_train_batch_size |
64 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon = 1e - 08 |
lr_scheduler_type |
linear |
warmup_steps |
2000 |
num_epochs |
4 |
Framework versions
Property |
Details |
Transformers |
4.24.0 |
Pytorch |
1.9.1+cu111 |
Deepspeed |
0.7.4 |
Pytorch - lightning |
1.8.1 |
đ License
This project is licensed under the Apache - 2.0 and BSD - 3 - Clause licenses.