NASES Open-Source Model - Free Deployment, Specifically Designed to Generate High-Quality Summaries for Spanish News!

NASES

Developed by ELiRF

The first monolingual Transformer model specifically designed for Spanish news abstract generation, enhancing abstractiveness through specialized pre-training, outperforming mainstream multilingual models.

Text Generation

Transformers

Spanish#Spanish abstract generation #Monolingual pre-training model #News abstract summarization

Downloads 14

Release Time : 3/2/2022

Model Overview

An encoder-decoder model based on Transformer architecture, specifically designed for Spanish news abstract generation, optimized for generation quality through four self-supervised pre-training tasks.

Model Features

Monolingual Specialization

Designed for Spanish language characteristics, outperforming multilingual models (e.g., mBART/mT5)

Enhanced Abstractiveness

Significantly improves summary reconstruction capabilities through four self-supervised pre-training tasks (e.g., sentence reordering/text infilling)

Novel Evaluation Metric

Proposes a 'content reorganization' metric specifically for evaluating semantic reconstruction in summary generation

Model Capabilities

News summarization generation

Semantic content reconstruction

Spanish text comprehension

Use Cases

Media industry

News briefing generation

Automatically generates key-point summaries for Spanish news articles

Produces more abstractive summaries than traditional extractive methods

Corporate analysis

Public opinion monitoring

Extracts key information from Spanish social media

Supports sentiment analysis for low-resource languages

🚀 NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Most existing abstractive summarization models in literature are tailored for English. Our work presents monolingual models for Catalan and Spanish, offering better performance and a new evaluation metric.

🚀 Quick Start

There is no specific quick - start information provided in the original document.

✨ Features

Monolingual Focus: Our models are specifically designed for Catalan and Spanish, addressing the limitations of multilingual models, especially for minority languages like Catalan.
Enhanced Abstractivity: Through several self - supervised pre - training tasks, the abstractivity of the generated summaries is increased.
New Evaluation Metric: We introduce a new metric called content reordering to better evaluate the abstractivity of generated summaries.

📦 Installation

There is no installation information provided in the original document.

💻 Usage Examples

There are no code examples provided in the original document.

📚 Documentation

General Introduction

Most models for abstractive summarization in the literature are suitable for English but not for other languages. Multilingual models were introduced to overcome language constraints, but their performance is often lower, especially for minority languages. In this paper, we present monolingual models for Catalan and Spanish.

NASes Model

Model Structure: News Abstractive Summarization for Spanish (NASes) is a Transformer encoder - decoder model with the same hyper - parameters as BART.
Pre - training: It is pre - trained on a combination of self - supervised tasks (sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation) using Spanish newspapers and Wikipedia articles (21GB of raw text - 8.5 million documents).
Fine - tuning: NASes is fine - tuned for the summarization task on 1,802,919 (document, summary) pairs from the Dataset for Automatic summarization of Catalan and Spanish newspaper Articles (DACSA).

New Evaluation Metric

The usual evaluation metrics like ROUGE and BertScore cannot correctly evaluate the abstractivity of generated summaries. We present a new metric called content reordering to evaluate the rearrangement of the original content, a common characteristic of abstractive summaries.

Experimentation

We carried out an exhaustive experiment to compare our monolingual models with two widely used multilingual models (mBART and mT5) in text summarization. The results support the quality of our monolingual models, considering that the multilingual models were pre - trained with many more resources.

🔧 Technical Details

Model Architecture: The models are Transformer encoder - decoder architectures.
Pre - training Tasks: Sentence permutation, text infilling, Gap Sentence Generation, and Next Segment Generation are used to pre - train the models, which helps to increase the abstractivity of the generated summaries.
New Metric: The content reordering metric is designed to evaluate the rearrangement of the original content in abstractive summaries.

📄 License

There is no license information provided in the original document.

⚠️ Important Note

On the 5th of April 2022, we detected a mistake in the configuration file; thus, the model was not generating the summaries correctly, and it was underperforming in all scenarios. For this reason, if you had used the model until that day, we would be glad if you would re - evaluate the model if you are publishing some results with it. We apologize for the inconvenience and thank you for your understanding.

BibTeX entry

@Article{app11219872,
AUTHOR = {Ahuir, Vicent and Hurtado, Lluís - F. and González, José Ángel and Segarra, Encarna},
TITLE = {NASca and NASes: Two Monolingual Pre - Trained Models for Abstractive Summarization in Catalan and Spanish},
JOURNAL = {Applied Sciences},
VOLUME = {11},
YEAR = {2021},
NUMBER = {21},
ARTICLE - NUMBER = {9872},
URL = {https://www.mdpi.com/2076 - 3417/11/21/9872},
ISSN = {2076 - 3417},
DOI = {10.3390/app11219872}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご