it5-small-news-summarization Open Source Model - Free Support for Automatic Summarization of Italian News

It5 Small News Summarization

Developed by gsarti

An Italian news summarization model fine-tuned on the IT5 small model, trained on Fanpage and Il Post datasets

Text Generation OtherOpen Source License:Apache-2.0 #Italian News Summarization #Sequence-to-Sequence Model #Multi-source Dataset Training

Downloads 27

Release Time : 3/2/2022

Model Overview

This model is specifically designed for automatic summarization of Italian news texts, capable of extracting key information from long articles to generate concise summaries.

Model Features

Italian Language Optimization

Pre-trained and fine-tuned specifically for Italian text, excelling in Italian news summarization tasks

Multi-dataset Training

Fine-tuned on datasets from two major Italian news outlets: Fanpage and Il Post

Efficient Inference

Small model design suitable for production deployment, balancing performance and computational resource requirements

Model Capabilities

Italian Text Understanding

News Summarization Generation

Key Information Extraction

Use Cases

News Media

Automatic News Summarization

Automatically generates article summaries for news websites to improve reader browsing efficiency

Achieved Rouge1 score of 0.333 on IlPost test set and 0.328 on Fanpage test set

Content Aggregation

Multi-source News Summarization

Aggregates news from different Italian sources and generates unified summaries

🚀 IT5 Small for News Summarization ✂️🗞️ 🇮🇹

This repository holds the checkpoint for the IT5 Small model. It's fine - tuned for news summarization on the Fanpage and Il Post corpora. This is part of the experiments in the paper IT5: Large - scale Text - to - text Pretraining for Italian Language Understanding and Generation by Gabriele Sarti and Malvina Nissim.

For a full overview of other released materials, check the gsarti/it5 repository. Refer to the paper for more details on the reported scores and the evaluation approach.

🚀 Quick Start

✨ Features

The model is fine - tuned on Italian news corpora for news summarization tasks.
It supports usage in Tensorflow, Pytorch, and JAX.
Can be used directly with pipelines or loaded using autoclasses.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import pipelines

newsum = pipeline("summarization", model='it5/it5-small-news-summarization')
newsum("Dal 31 maggio è infine partita la piattaforma ITsART, a più di un anno da quando – durante il primo lockdown – il ministro della Cultura Dario Franceschini ne aveva parlato come di «una sorta di Netflix della cultura», pensata per «offrire a tutto il mondo la cultura italiana a pagamento». È presto per dare giudizi definitivi sulla piattaforma, e di certo sarà difficile farlo anche più avanti senza numeri precisi. Al momento, l’unica cosa che si può fare è guardare com’è fatto il sito, contare quanti contenuti ci sono (circa 700 “titoli”, tra film, documentari, spettacoli teatrali e musicali e altri eventi) e provare a dare un giudizio sul loro valore e sulla loro varietà. Intanto, una cosa notata da più parti è che diversi contenuti di ITsART sono a pagamento sulla piattaforma sebbene altrove, per esempio su RaiPlay, siano invece disponibili gratuitamente.")
>>> [{"generated_text": "ITsART, la Netflix della cultura italiana, parte da maggio. Film, documentari, spettacoli teatrali e musicali disponibili sul nuovo sito a pagamento."}]

Advanced Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("it5/it5-small-news-summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("it5/it5-small-news-summarization")

📚 Documentation

The main documentation about the model can be found in the paper IT5: Large - scale Text - to - text Pretraining for Italian Language Understanding and Generation.

🔧 Technical Details

No specific technical details (more than 50 words of specific technical description) are provided in the original document, so this section is skipped.

📄 License

The model is licensed under the Apache - 2.0 license.

📋 Information Table

Property	Details
Model Type	IT5 Small fine - tuned for news summarization
Training Data	Fanpage and Il Post corpora
Tags	italian, sequence - to - sequence, fanpage, ilpost, summarization
CO2 Emissions	8g (source: Google Cloud Platform Carbon Footprint, training type: fine - tuning, geographical location: Eemshaven, Netherlands, Europe, hardware used: 1 TPU v3 - 8 VM)
Thumbnail	https://gsarti.com/publication/it5/featured.png

💡 Usage Tip

If you use this model in your research, please cite our work as:

@article{sarti-nissim-2022-it5,
    title={{IT5}: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
    author={Sarti, Gabriele and Nissim, Malvina},
    journal={ArXiv preprint 2203.03759},
    url={https://arxiv.org/abs/2203.03759},
    year={2022},
	month={mar}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご