WRAP Open-Source Classification Model - Free Twitter Information Extraction and Precise Identification of Four Major Categories

WRAP

Developed by TomatenMarc

WRAP is an advanced classification model designed to extract information and inferences from Twitter data, capable of identifying four unique categories in tweets: reasons, claims, notifications, and no-category.

Text Classification

Transformers

English#Twitter Argument Mining #Inference Information Classification #Multi-Topic Generalization

Downloads 108

Release Time : 9/12/2023

Model Overview

WRAP is built on AutoModelForSequenceClassification, utilizing an improved BERTweet-base architecture enhanced with contrastive learning to better encode reasoning and information in tweets.

Model Features

Inference and Information-Driven Classification

Capable of identifying reasoning and information components in tweets and classifying them into reasons, claims, notifications, or no-category.

Improved Embedding Representation

Enhances tweet embedding representation through WRAPresentations technology, improving the BERTweet-base architecture.

Multi-Topic Generalization Ability

Demonstrates strong generalization ability across multiple topics (e.g., abortion, Brexit).

Model Capabilities

Text Classification

Argument Mining

Opinion Mining

Information Extraction

Inference Extraction

Use Cases

Social Media Analysis

Twitter Argument Mining

Identifies argument structures in tweets, such as reasons, claims, etc.

Performs well in closed-topic and cross-topic tests, achieving a macro F1-score of 86.62%.

Information and Inference Classification

Classifies whether tweets contain information or inference components.

Achieves micro F1-scores of 78.14% (reasons) and 79.36% (notifications) in multi-class tasks.

🚀 WRAP -- A TACO-based Classifier For Inference and Information-Driven Argument Mining on Twitter

WRAP is an advanced classification model built on AutoModelForSequenceClassification. It's designed to classify tweets into four distinct classes: Reason, Statement, Notification, and None from the TACO dataset. This specialized model is for extracting information and inferences from Twitter data, and it gets its name from WRAPresentations. WRAPresentations is an improvement of the BERTweet-base architecture. Its embeddings are extended on augmented tweets using contrastive learning to better encode inference and information in tweets.

✨ Features

Class Semantics

The TACO framework is centered around two key elements of an argument, as defined by the Cambridge Dictionary. It defines inference as a guess that you make or an opinion that you form based on the information that you have, and information as facts or details about a person, company, product, etc.

WRAP can identify specific tweet classes where inferences and information can be aggregated in relation to these distinct classes:

Statement: Refers to unique cases where only the inference is presented as something that someone says or writes officially, or an action done to express an opinion.
Reason: Represents a full argument where the inference is based on direct information mentioned in the tweet, like a source-reference or quotation, revealing the author’s motivation to try to understand and to make judgments based on practical facts.
Notification: A tweet that only provides information, such as media channels promoting their latest articles.
None: A tweet that provides neither inference nor information.

WRAP can classify tweets in the following hierarchy:

📦 Installation

Using this model is easy when you have transformers installed:

pip install -U transformers

💻 Usage Examples

Basic Usage

from transformers import pipeline

pipe = pipeline("text-classification", model="TomatenMarc/WRAP")
prediction = pipe("Huggingface is awesome")

print(prediction)

Notice: The tweets need to undergo preprocessing before classification.

🔧 Technical Details

Training

The final model was trained using the entire shuffled ground truth dataset TACO, which has 1734 tweets in total. The topic distribution in this dataset is: #abortion (25.9%), #brexit (29.0%), #got (11.0%), #lotrrop (12.1%), #squidgame (12.7%), and #twittertakeover (9.3%). We used SimpleTransformers for training.

The category and class distribution of the TACO dataset is as follows:

Inference	No-Inference
865 (49.88%)	869 (50.12%)

Information	No-Information
1081 (62.34%)	653 (37.66%)

Reason	Statement	Notification	None
581 (33.50%)	284 (16.38%)	500 (28.84%)	369 (21.28%)

Notice: Our training involved WRAP to forecast class predictions, where the categories (information/inference) represent class aggregations based on the inference or information component.

Dataloader

"data_loader": {
    "type": "torch.utils.data.dataloader.DataLoader",
    "args": {
        "batch_size": 8,
        "sampler": "torch.utils.data.sampler.RandomSampler"
    }
}

Parameters of the fit()-Method

{
    "epochs": 5,
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 4e-05
    },
    "scheduler": "WarmupLinear",
    "warmup_steps": 66
}

Evaluation

We used a 6-fold (Closed-Topic) cross-validation method to show WRAP's optimal performance. We used the same dataset and parameters as in the Training section, training on k - 1 splits and making predictions using the kth split.

We also assessed its generalization ability across the 6 topics (Cross-Topic) of TACO. Each of the k topics was used for testing, while the remaining k - 1 topics were used for training.

In total, the WRAP classifier performs as follows:

Binary Classification Tasks

Macro-F1	Inference	Information	Multi-Class
Closed-Topic	86.62%	86.30%	75.29%
Cross-Topic	86.27%	84.90%	73.54%

Multi-Class Classification Task

Micro-F1	Reason	Statement	Notification	None
Closed-Topic	78.14%	60.96%	79.36%	82.72%
Cross-Topic	77.05%	58.33%	78.45%	80.33%

📚 Documentation

Environmental Impact

Hardware Type: A100 PCIe 40GB
Hours used: 10 min
Cloud Provider: Google Cloud Platform
Compute Region: asia-southeast1 (Singapore)
Carbon Emitted: 0.02kg CO2

📄 License

📖 Citation

@inproceedings{feger-dietze-2024-bertweets,
    title = "{BERT}weet{'}s {TACO} Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On {T}witter",
    author = "Feger, Marc  and
              Dietze, Stefan",
    editor = "Duh, Kevin  and
              Gomez, Helena  and
              Bethard, Steven",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-naacl.146",
    doi = "10.18653/v1/2024.findings-naacl.146",
    pages = "2256--2266"
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご