ClickbaitFighter-10B Open-source Model - Free Deployment Reveals the Truth Behind Spanish Clickbait News

Clickbaitfighter 10B

Developed by Iker

A Spanish clickbait news summarization model fine-tuned on the NoticIA dataset, capable of revealing the true content behind sensational headlines

Large Language Model

Transformers

Spanish#Spanish Clickbait Analysis #News Summarization #High ROUGE Score

Downloads 48

Release Time : 3/22/2024

Model Overview

This model is specifically designed to analyze Spanish clickbait news and generate concise summaries that reveal the true information behind the headlines. Fine-tuned based on Nous-Hermes-2-SOLAR-10.7B, it performs exceptionally well on the NoticIA dataset.

Model Features

Clickbait News Analysis

Specifically designed for Spanish clickbait news, effectively identifying and deconstructing sensational headlines

Precise Summarization

Generates one-sentence summaries focusing on core facts, particularly adept at handling direct quotes

High-Quality Fine-tuning

Fine-tuned on the professional NoticIA dataset, achieving a ROUGE score of 52.01

Model Capabilities

Spanish text comprehension

Clickbait content recognition

News summarization

Direct quote extraction

Use Cases

News Media

Clickbait News Fact-Checking

Automatically generates truth summaries for clickbait news on news platforms

Helps readers quickly understand the substantive content of the news

Content Moderation

Social Media Content Moderation

Identifies misleading headline content on social media

Assists human moderation teams in improving efficiency

🚀 Clickbait Article Summarization Model

This is a model fine - tuned with the NoticIA Dataset, capable of generating summaries for clickbait headlines.

🚀 Quick Start

This model is fine - tuned with the NoticIA Dataset and can generate summaries for clickbait headlines.

✨ Features

Clickbait Summarization: It can analyze news with clickbait headlines and generate single - sentence summaries to reveal the truth behind the headlines.
Multilingual Support: Although the examples are in Spanish, it can potentially be adapted for other languages with proper fine - tuning.

📦 Installation

To use this model, you need to install the following libraries:

pip install torch
pip3 install newspaper3k
pip install transformers
pip install datasets

💻 Usage Examples

Basic Usage

Summarize a web article

import torch # pip install torch
from newspaper import Article #pip3 install newspaper3k
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig # pip install transformers

article_url ="https://www.huffingtonpost.es/virales/le-compra-abrigo-abuela-97nos-reaccion-fantasia.html"
article = Article(article_url)
article.download()
article.parse()
headline=article.title
body = article.text

def prompt(
    headline: str,
    body: str,
) -> str:
    """
    Generate the prompt for the model.

    Args:
        headline (`str`):
            The headline of the article.
        body (`str`):
            The body of the article.
    Returns:
        `str`: The formatted prompt.
    """

    return (
        f"Ahora eres una Inteligencia Artificial experta en desmontar titulares sensacionalistas o clickbait. "
        f"Tu tarea consiste en analizar noticias con titulares sensacionalistas y "
        f"generar un resumen de una sola frase que revele la verdad detrás del titular.\n"
        f"Este es el titular de la noticia: {headline}\n"
        f"El titular plantea una pregunta o proporciona información incompleta. "
        f"Debes buscar en el cuerpo de la noticia una frase que responda lo que se sugiere en el título. "
        f"Siempre que puedas cita el texto original, especialmente si se trata de una frase que alguien ha dicho. "
        f"Si citas una frase que alguien ha dicho, usa comillas para indicar que es una cita. "
        f"Usa siempre las mínimas palabras posibles. No es necesario que la respuesta sea una oración completa. "
        f"Puede ser sólo el foco de la pregunta. "
        f"Recuerda responder siempre en Español.\n"
        f"Este es el cuerpo de la noticia:\n"
        f"{body}\n"
    )

prompt = prompt(headline=headline, body=body)

tokenizer = AutoTokenizer.from_pretrained("Iker/ClickbaitFighter-10B")
model = AutoModelForCausalLM.from_pretrained(
    "Iker/ClickbaitFighter-10B", torch_dtype=torch.bfloat16, device_map="auto"
)

formatted_prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    tokenize=False,
    add_generation_prompt=True,
)

model_inputs = tokenizer(
    [formatted_prompt], return_tensors="pt", add_special_tokens=False
)

model_output = model.generate(**model_inputs.to(model.device), generation_config=GenerationConfig(
  max_new_tokens=32,
  min_new_tokens=1,
  do_sample=False,
  num_beams=1,
  use_cache=True
))

summary = tokenizer.batch_decode(model_output,skip_special_tokens=True)[0]

print(summary.strip().split("\n")[-1]) # Get only the summary, without the prompt.

Run inference in the NoticIA dataset

import torch # pip install torch
from datasets import load_dataset # pip install datasets
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig # pip install transformers

dataset = load_dataset("Iker/NoticIA")
example = dataset["test"][0]
headline = example["web_headline"]
body = example["web_text"]

def prompt(
    headline: str,
    body: str,
) -> str:
    """
    Generate the prompt for the model.

    Args:
        headline (`str`):
            The headline of the article.
        body (`str`):
            The body of the article.
    Returns:
        `str`: The formatted prompt.
    """

    return (
        f"Ahora eres una Inteligencia Artificial experta en desmontar titulares sensacionalistas o clickbait. "
        f"Tu tarea consiste en analizar noticias con titulares sensacionalistas y "
        f"generar un resumen de una sola frase que revele la verdad detrás del titular.\n"
        f"Este es el titular de la noticia: {headline}\n"
        f"El titular plantea una pregunta o proporciona información incompleta. "
        f"Debes buscar en el cuerpo de la noticia una frase que responda lo que se sugiere en el título. "
        f"Siempre que puedas cita el texto original, especialmente si se trata de una frase que alguien ha dicho. "
        f"Si citas una frase que alguien ha dicho, usa comillas para indicar que es una cita. "
        f"Usa siempre las mínimas palabras posibles. No es necesario que la respuesta sea una oración completa. "
        f"Puede ser sólo el foco de la pregunta. "
        f"Recuerda responder siempre en Español.\n"
        f"Este es el cuerpo de la noticia:\n"
        f"{body}\n"
    )

prompt = prompt(headline=headline, body=body)

tokenizer = AutoTokenizer.from_pretrained("Iker/ClickbaitFighter-10B")
model = AutoModelForCausalLM.from_pretrained(
    "Iker/ClickbaitFighter-10B", torch_dtype=torch.bfloat16, device_map="auto"
)

formatted_prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    tokenize=False,
    add_generation_prompt=True,
)

model_inputs = tokenizer(
    [formatted_prompt], return_tensors="pt", add_special_tokens=False
)

model_output = model.generate(**model_inputs.to(model.device), generation_config=GenerationConfig(
  max_new_tokens=32,
  min_new_tokens=1,
  do_sample=False,
  num_beams=1,
  use_cache=True
))

summary = tokenizer.batch_decode(model_output,skip_special_tokens=True)[0]

print(summary.strip().split("\n")[-1]) # Get only the summary, without the prompt.

📚 Documentation

Model Information

Property	Details
Model Type	Fine - tuned on NousResearch/Nous - Hermes - 2 - SOLAR - 10.7B
Training Data	Iker/NoticIA
Pipeline Tag	Text Generation
Evaluation Metrics	Rouge

Open Source Models

	Iker/ClickbaitFighter - 2B	Iker/ClickbaitFighter - 7B	Iker/ClickbaitFighter - 10B
Param. no.	2B	7B	10M
ROUGE	36.26	49.81	52.01

Evaluation Results

📄 License

This project is licensed under the CC - BY - NC - SA 4.0 license.

📚 Citation

@misc{noticia2024,
      title={NoticIA: A Clickbait Article Summarization Dataset in Spanish}, 
      author={Iker García-Ferrero and Begoña Altuna},
      year={2024},
      eprint={2404.07611},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご