it5-base-news-summarization Open Source Model - Supports Key Information Extraction and Summary Generation for Italian News

It5 Base News Summarization

Developed by gsarti

An Italian news summarization model fine-tuned from the IT5 base model, capable of extracting key information from news texts to generate concise summaries.

Text Generation OtherOpen Source License:Apache-2.0 #Italian Abstract #News Condensation #Sequence-to-Sequence

Downloads 405

Release Time : 3/2/2022

Model Overview

This model is specifically designed for automatic summarization of Italian news content, fine-tuned on datasets from Fanpage and Il Post, effectively extracting the core content of news texts.

Model Features

Italian Language Optimization

Specially optimized for Italian news content, understanding the unique linguistic structures and expressions of Italian.

Multi-source Data Training

Trained on news datasets from two different sources, Fanpage and Il Post, enhancing the model's generalization capability.

Efficient Summarization

Capable of quickly processing long texts and generating concise, accurate news summaries.

Model Capabilities

Italian Text Understanding

News Summarization

Long Text Compression

Key Information Extraction

Use Cases

News Media

Automatic News Summarization

Automatically generates brief summaries of news content for news websites or applications

Rouge1 score 0.339, Rouge2 score 0.160, RougeL score 0.263

Content Aggregation

Multi-source News Summarization

Generates unified summary content from multiple news sources

🚀 IT5 Base for News Summarization ✂️🗞️ 🇮🇹

This repository offers the checkpoint for the IT5 Base model. This model has been fine - tuned for news summarization on the Fanpage and Il Post corpora. It's part of the experiments in the paper IT5: Large - scale Text - to - text Pretraining for Italian Language Understanding and Generation by Gabriele Sarti and Malvina Nissim.

The gsarti/it5 repository provides a comprehensive overview of other released materials. For more details about the reported scores and the evaluation approach, refer to the paper.

🚀 Quick Start

✨ Features

Multilingual Support: Focuses on Italian news summarization, leveraging the IT5 Base model.
Multiple Frameworks Compatibility: Model checkpoints are available for Tensorflow, Pytorch, and JAX.
Easy - to - Use: Can be used directly with pipelines or loaded using autoclasses.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import pipelines

newsum = pipeline("summarization", model='it5/it5-base-news-summarization')
newsum("Dal 31 maggio è infine partita la piattaforma ITsART, a più di un anno da quando – durante il primo lockdown – il ministro della Cultura Dario Franceschini ne aveva parlato come di «una sorta di Netflix della cultura», pensata per «offrire a tutto il mondo la cultura italiana a pagamento». È presto per dare giudizi definitivi sulla piattaforma, e di certo sarà difficile farlo anche più avanti senza numeri precisi. Al momento, l’unica cosa che si può fare è guardare com’è fatto il sito, contare quanti contenuti ci sono (circa 700 “titoli”, tra film, documentari, spettacoli teatrali e musicali e altri eventi) e provare a dare un giudizio sul loro valore e sulla loro varietà. Intanto, una cosa notata da più parti è che diversi contenuti di ITsART sono a pagamento sulla piattaforma sebbene altrove, per esempio su RaiPlay, siano invece disponibili gratuitamente.")
>>> [{"generated_text": "ITsART, la Netflix della cultura italiana, parte da maggio. Film, documentari, spettacoli teatrali e musicali disponibili sul nuovo sito a pagamento."}]

Advanced Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("it5/it5-base-news-summarization")
model = AutoModelForSeq2SeqLM.from_pretrained("it5/it5-base-news-summarization")

📚 Documentation

A comprehensive overview of other released materials is provided in the gsarti/it5 repository. Refer to the paper IT5: Large - scale Text - to - text Pretraining for Italian Language Understanding and Generation for additional details concerning the reported scores and the evaluation approach.

📄 License

The model is released under the Apache - 2.0 license.

🔧 Technical Details

No specific technical details are provided in the original document.

BibTeX Citation

If you use this model in your research, please cite our work as:

@article{sarti-nissim-2022-it5,
    title={{IT5}: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
    author={Sarti, Gabriele and Nissim, Malvina},
    journal={ArXiv preprint 2203.03759},
    url={https://arxiv.org/abs/2203.03759},
    year={2022},
	month={mar}
}

📋 Information Table

Property	Details
Model Type	IT5 Base fine - tuned for news summarization
Training Data	Fanpage and Il Post corpora
License	Apache - 2.0
Tags	italian, sequence - to - sequence, fanpage, ilpost, summarization
CO2 Emissions	17 (source: Google Cloud Platform Carbon Footprint, fine - tuning, Eemshaven, Netherlands, Europe, 1 TPU v3 - 8 VM)
Thumbnail	https://gsarti.com/publication/it5/featured.png

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご