đ data-silence/any-news-sum
This repository holds an mT5 checkpoint that has been fine - tuned on 45 languages from the sumnews dataset, which is based on the well - known XL - Sum. The model is designed to tackle the news summarization task. It can generate both a headline and a summary of a news article simultaneously based on the full content of the article. Although the training mainly focused on the Russian language, to some extent, the model can handle text in any language supported by the mT5 base model and the XL - Sum dataset.
đ Quick Start
đ Testing this model on Spaces
You can try out the trained model here
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq
model_name = "data-silence/any-news-sum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def generate_summary_with_special_tokens(text, max_length=512):
inputs = tokenizer(text, return_tensors="pt", max_length=max_length, truncation=True).to(device)
outputs = model.generate(
**inputs,
max_length=max_length,
num_return_sequences=1,
no_repeat_ngram_size=4,
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=False)
parts = generated_text.split('<title_resume_sep>')
title = parts[0].replace("<pad> ", "").strip()
resume = parts[1].replace("</s>", "").strip() if len(parts) > 1 else ""
return title, resume
title, resume = generate_summary_with_special_tokens('Patients with heart diseases often have a low level of melatonin and a disrupted sleep - wake cycle. Until now, the mechanisms underlying this phenomenon remained unclear. In an article published in the journal Science, a team from the Technical University of Munich (TUM) shows exactly how heart diseases affect the production of the sleep hormone in the pineal gland. And the ganglion in the neck area turns out to be the connecting link between the two organs.')
print(title)
print(resume)
đ Documentation
đ§ Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 6
- eval_batch_size: 6
- seed: 42
- gradient_accumulation_steps: 6
- total_train_batch_size: 36
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 4
đ Evaluation result
This model achieves the following results on the evaluation set:
Property |
Details |
Training Loss |
0.4487 |
Epoch |
4.0 |
Step |
20496 |
Evaluation Runtime (s) |
3433.4702 |
Evaluation Samples/Sec |
9.37 |
Evaluation Steps/Sec |
1.562 |
Evaluation Loss |
0.2748 |
Evaluation Title (ROUGE - 1) |
0.1373 |
Evaluation Title (ROUGE - 2) |
0.0489 |
Evaluation Title (ROUGE - L) |
0.1220 |
Evaluation Resume (ROUGE - 1) |
0.0016 |
Evaluation Resume (ROUGE - 2) |
0.0005 |
Evaluation Resume (ROUGE - L) |
0.0015 |
đ ī¸ Framework versions
- Transformers 4.42.4
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1