Open-source roberta-argument debate text classification model for quickly identifying whether the text contains debate content

Roberta Argument

Developed by chkla

A fine-tuned argumentative text classification model based on the RoBERTa(base) pre-trained model, used to identify whether text contains argumentative content.

Text Classification English#Controversial topic debate identification #Position reasoning detection #RoBERTa fine-tuning

Downloads 495

Release Time : 3/2/2022

Model Overview

This model can classify text as non-argumentative or argumentative, suitable for research on argument mining in controversial topics.

Model Features

High accuracy

Achieves an accuracy of 0.8193 and an F1 score of 0.8021 on the test set.

Multi-topic support

Supports argument analysis for eight controversial topics including nuclear energy, abortion, and capital punishment.

Fine-tuned based on RoBERTa

Utilizes the RoBERTa(base) pre-trained model for fine-tuning, enhancing argumentative text recognition capabilities.

Model Capabilities

Argumentative text recognition

Non-argumentative text recognition

Controversial topic analysis

Use Cases

Academic research

Argument mining research

Used to analyze argument structures in controversial topics, supporting academic research.

Effectively identifies argumentative texts with a recall rate of 0.8463.

Content moderation

Controversial content identification

Helps platforms identify texts containing argumentative content, assisting in content moderation.

🚀 RoBERTArg

🤖 RoBERTArg is a model designed to classify text into two labels: NON - ARGUMENT (0) and ARGUMENT (1). It was trained on a dataset of controversial topics, offering a valuable tool for argument mining.

🚀 Quick Start

The model has been trained to classify text based on whether it presents an argument or not. You can use it as a starting point for your research in the field of argument mining.

✨ Features

Heterogeneous Training Data: Trained on ~25k manually - annotated sentences of controversial topics.
Binary Classification: Capable of classifying text into NON - ARGUMENT and ARGUMENT labels.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples for using the model are provided in the original document, so this section is skipped.

📚 Documentation

🤖 Model description

This model was trained on ~25k heterogeneous manually annotated sentences (📚 [Stab et al. 2018](https://www.aclweb.org/anthology/D18 - 1402/)) of controversial topics to classify text into one of two labels: 🏷 NON - ARGUMENT (0) and ARGUMENT (1).

🗃 Dataset

The dataset (📚 Stab et al. 2018) consists of ARGUMENTS (~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a NON - ARGUMENT (~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include "an obvious polarity to the possible outcomes" and compile a final set of eight controversial topics: abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage.

TOPIC	ARGUMENT	NON - ARGUMENT
abortion	2213	2,427
school uniforms	325	1,734
death penalty	325	2,083
marijuana legalization	325	1,262
nuclear energy	325	2,118
cloning	325	1,494
gun control	325	1,889
minimum wage	325	1,346

🏃🏼‍♂️ Model training

RoBERTArg was fine - tuned on a RoBERTA (base) pre - trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters:

training_args = TrainingArguments(
    num_train_epochs=2,
    learning_rate=2.3102e-06,
    seed=8,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
)

📊 Evaluation

The model was evaluated on an evaluation set (20%):

Model	Acc	F1	R arg	R non	P arg	P non
RoBERTArg	0.8193	0.8021	0.8463	0.7986	0.7623	0.8719

Showing the confusion matrix using again the evaluation set:

	ARGUMENT	NON - ARGUMENT
ARGUMENT	2213	558
NON - ARGUMENT	325	1790

⚠️ Important Note

The model can only be a starting point to dive into the exciting field of argument mining. But be aware. An argument is a complex structure, with multiple dependencies. Therefore, the model may perform less well on different topics and text types not included in the training set.

🐦 Twitter

Follow the developer on Twitter: @chklamm

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご