BART_news_summarizer Open-Source News Summarization Model - Free Deployment to Generate Efficient and High-Quality Summaries

BART News Summarizer

Developed by JordiAb

A news summarization model fine-tuned based on BART-large, using StableBeluga-7B as the teacher model to provide efficient and high-quality news summaries

Text Generation

Transformers

English#Efficient Summarization Generation #Specialized for News Domain #Teacher-Student Distillation Model

Downloads 44

Release Time : 2/28/2024

Model Overview

This model is specifically designed for generating abstract summaries of news articles, improving inference speed and resource efficiency while maintaining high-quality output

Model Features

Teacher-Student Learning Framework

Uses StableBeluga-7B as the teacher model to guide the training of the BART-large model, balancing quality and efficiency

Efficient Inference

3x faster inference speed compared to the teacher model with significantly reduced GPU memory usage

High-Quality Summaries

ROUGE1 score of 0.66, with a cosine similarity of 0.90 to teacher model summaries

News Domain Optimization

Specifically fine-tuned for news articles, excelling in news summarization tasks

Model Capabilities

News article summarization

English text processing

Efficient inference

Use Cases

News Content Processing

News Aggregation Platform

Automatically generates article summaries for news aggregation platforms

Helps users quickly browse key news points

Media Monitoring

Automatically processes large volumes of news articles and generates summaries

Improves media monitoring efficiency

Content Analysis

Trend Analysis

Quickly analyzes news trends through summaries

Assists in decision-making

🚀 News Articles Teacher-Student Abstractive Summarizer

A fine - tuned model based on BART - large for high - quality news article summarization with improved efficiency.

🚀 Quick Start

# Load the Model
model = AutoModelForSeq2SeqLM.from_pretrained("JordiAb/BART_news_summarizer")
tokenizer = AutoTokenizer.from_pretrained("JordiAb/BART_news_summarizer")

# News article text
article_text = """
Los Angeles Lakers will have more time than anticipated. The four - time NBA Most Valuable Player (MVP) extended his contract for two years and $85 million, keeping him in California until 2023. In 2018, The King had already signed for 153 mdd and, in his second campaign in the quintet, led the championship in the Orlando bubble. With 35 years of life – he turns 36 on December 30 – and 17 campaigns of experience, LeBron is still considered one of the best (or the best) NBA players. You can read: "Mercedes found Lewis Hamilton\'s substitute" James just took the Lakers to his first NBA title since 2010 and was named MVP of the Finals; he led the League in assists per game (10.2) for the first time in his career, while adding 25.3 points and 7.8 rebounds per performance, during the last campaign. James has adapted to life in Hollywood, as he will be part of the sequel to Space Jam, to be released next year.
"""

# tokenize text
inputs = tokenizer(article_text, return_tensors='pt')
# generate summary
with torch.no_grad():
  summary_ids = model.generate(
    inputs['input_ids'],
    num_beams=4,
    max_length=250,
    early_stopping=True
  )
# decode summary
summary = tokenizer.decode(
  summary_ids[0],
  skip_special_tokens=True
)

✨ Features

High - Quality Summarization: Provide high - quality abstractive summarization of news articles.
Efficiency: Improve efficiency in terms of speed and computational resource usage compared to the teacher model.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

# Load the Model
model = AutoModelForSeq2SeqLM.from_pretrained("JordiAb/BART_news_summarizer")
tokenizer = AutoTokenizer.from_pretrained("JordiAb/BART_news_summarizer")

# News article text
article_text = """
Los Angeles Lakers will have more time than anticipated. The four - time NBA Most Valuable Player (MVP) extended his contract for two years and $85 million, keeping him in California until 2023. In 2018, The King had already signed for 153 mdd and, in his second campaign in the quintet, led the championship in the Orlando bubble. With 35 years of life – he turns 36 on December 30 – and 17 campaigns of experience, LeBron is still considered one of the best (or the best) NBA players. You can read: "Mercedes found Lewis Hamilton\'s substitute" James just took the Lakers to his first NBA title since 2010 and was named MVP of the Finals; he led the League in assists per game (10.2) for the first time in his career, while adding 25.3 points and 7.8 rebounds per performance, during the last campaign. James has adapted to life in Hollywood, as he will be part of the sequel to Space Jam, to be released next year.
"""

# tokenize text
inputs = tokenizer(article_text, return_tensors='pt')
# generate summary
with torch.no_grad():
  summary_ids = model.generate(
    inputs['input_ids'],
    num_beams=4,
    max_length=250,
    early_stopping=True
  )
# decode summary
summary = tokenizer.decode(
  summary_ids[0],
  skip_special_tokens=True
)

Advanced Usage

No advanced usage code is provided in the original document, so this part is skipped.

📚 Documentation

Model Details

Property	Details
Model Type	Abstractive Summarization
Base Model	BART - large
Teacher Model	StableBeluga - 7B
Language	English

DataSet

Source: 295,174 news articles scrapped from a Mexican newspaper.
Translation: The Spanish articles were translated to English using the Helsinki - NLP/opus - mt - es - en NLP model.
Teacher Summaries: Generated by StableBeluga - 7B.

Training

The fine - tuning process involved using the teacher observations (summaries) generated by StableBeluga - 7B to train a lightweight BART model. This approach aims to replicate the summarization quality of the teacher model while achieving faster inference times and reduced GPU memory usage.

Performance

Evaluation Metrics:
- ROUGE1: 0.66
- Cosine Similarity: 0.90
Inference Speed: 3x faster than the teacher model (StableBeluga - 7B)
Resource Usage: Significantly less GPU memory compared to StableBeluga - 7B

Objective

The primary goal of this model is to provide a lightweight summarization solution that maintains high - quality output similar to the teacher model (StableBeluga - 7B) but operates with greater efficiency, making it suitable for deployment in resource - constrained environments.

Use Cases

This model is ideal for applications requiring quick and efficient summarization of large volumes of news articles, particularly in settings where computational resources are limited.

Limitations

Language Translation: The initial translation from Spanish to English may introduce minor inaccuracies that could affect the summarization quality.
Domain Specificity: Fine - tuned specifically on news articles, performance may vary on texts from different domains.

Future Work

Future improvements could involve:

Fine - tuning the model on bilingual data to eliminate translation steps.
Expanding the dataset to include a wider variety of news sources and topics.
Exploring further optimizations to reduce inference time and resource usage.

🔧 Technical Details

The model is a fine - tuned version of BART - large, using StableBeluga - 7B as the teacher model. The fine - tuning process leverages the summaries generated by the teacher model to train the BART model, aiming to achieve similar summarization quality with better efficiency in terms of speed and resource usage.

📄 Conclusion

The News Articles Teacher - Student Abstractive Summarizer model demonstrates the potential to deliver high - quality summaries efficiently, making it a valuable tool for news content processing and similar applications.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご