đ Bart-Large Summarization Model
This repository houses the Bart-Large-paper2slides-summarizer Model. It's fine - tuned on the Automatic Slide Generation from Scientific Papers dataset using unsupervised learning techniques from the algorithm in the paper 'Unsupervised Machine Translation Using Monolingual Corpora Only'. The model is designed to precisely summarize scientific texts. It is trained in parallel with the Bart-large-paper2slides-expander by the same contributor.
⨠Features
- The model recounts elements of classic bag - of - features models and introduces simpler DNN - based BagNets.
- It aims to reduce sequential computation, similar to models like Extended Neural GPU, ByteNet, and ConvS2S, but with a more efficient way to handle long - distance dependencies using self - attention.
- It can summarize scientific texts accurately and is fine - tuned on a specific slide generation dataset.
đĻ Installation
To use this model, you need to install the transformers
library. You can install it using pip
:
pip install transformers
đģ Usage Examples
Basic Usage
from transformers import BartTokenizer, BartForConditionalGeneration, pipeline
model_name = "com3dian/Bart-large-paper2slides-summarizer"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)
input_text = "Your input text here..."
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids)
summary = tokenizer.decode(output[0], skip_special_tokens=True)
print(summary)
summarizer = pipeline("summarization", model=model_name)
summary = summarizer(input_text, max_length=50, min_length=30, do_sample=False)
print(summary)
đ Documentation
Model Details
Bart (Bidirectional and Auto - Regressive Transformers) is a sequence - to - sequence (seq2seq) model developed by Facebook AI Research. The Bart - Large model consists of 12 encoder and decoder layers and has a total of 400 million parameters.
Model Fine - tuning Details
The fine - tuning process used unsupervised learning on the slide generation dataset. The model learns to back - summarize the input from the expansion model into the original texts. The specific hyperparameters are:
- Batch Size: 4
- Learning Rate: 2e - 6
- Training Steps: 3*7
- Optimizer: AdamW
Model Performance
The Bart - Large Slide Generation Model has been evaluated by humans across various scientific domains, such as AI, mathematics, statistics, history, geography, and climate science, and compared with the Bart - large - cnn model.
đ§ Technical Details
The goal of reducing sequential computation is a key aspect. Models like Extended Neural GPU, ByteNet, and ConvS2S use convolutional neural networks. In these models, the number of operations to relate signals from two arbitrary positions grows with the distance between them. The Transformer reduces this to a constant number of operations, and Multi - Head Attention is used to counteract the reduced effective resolution. Self - attention is used to compute a representation of a single sequence.
đ License
This model and the associated code are released under the MIT license.
Acknowledgments
We thank the authors of the Bart model and the creators of the slide generation dataset for their contributions. If you use this model, please cite the original Bart model, the slide generation dataset, and this paper.