roberta-large-fallacy-classification开源文本分类模型

首页

Roberta Large Fallacy Classification

由 MidhunKanadan 开发

基于roberta-large微调的文本分类模型，专门用于识别13种常见逻辑谬误类型

文本分类

Transformers

英语开源协议:Apache-2.0 #逻辑谬误检测 #论证质量评估 #批判性思维辅助

下载量 26

发布时间 : 11/9/2024

模型简介

该模型能够对文本中的各类逻辑谬误进行分类，适用于教育、论证分析和内容审核等场景

模型特点

多类别谬误识别

能够识别13种不同类型的逻辑谬误，包括偷换概念、错误概括、虚假因果等

精细调优

采用类别权重处理数据不平衡问题，并使用低学习率(2e-6)进行精细调优

高效推理

支持最大128个token的输入长度，在GPU上可实现快速推理

模型能力

文本分类

逻辑谬误检测

论证质量评估

使用案例

教育领域

批判性思维教学

通过识别常见谬误来教授逻辑推理和批判性思维

帮助学生识别和避免论证中的逻辑错误

内容分析

论证有效性评估

评估辩论、论文和文章中的论证有效性

提供论证质量的量化指标

内容审核

识别在线辩论或社交媒体讨论中的逻辑缺陷

提高讨论质量，减少误导性言论

AI增强

对话系统增强

增强对话系统的逻辑推理能力

使AI对话更具逻辑性和说服力

🚀 roberta-large-谬误分类模型

本模型是roberta-large的微调版本，在逻辑谬误分类数据集上进行训练。它能够对文本中的各种逻辑谬误类型进行分类。

🚀 快速开始

此模型可通过文本管道进行快速分类，以下是使用示例：

from transformers import pipeline

pipe = pipeline("text-classification", model="MidhunKanadan/roberta-large-fallacy-classification", device=0)
text = "The rooster crows always before the sun rises, therefore the crowing rooster causes the sun to rise."
result = pipe(text)[0]
print(f"Predicted Label: {result['label']}, Score: {result['score']:.4f}")

预期输出：

Predicted Label: false causality, Score: 0.9632

✨ 主要特性

基于roberta-large模型进行微调，能够准确分类文本中的逻辑谬误。
支持13种不同类型的逻辑谬误分类。
采用类权重处理数据集不平衡问题。
支持截断和填充的分词方式（最大长度：128）。

📚 详细文档

模型详情

属性	详情
基础模型	`roberta-large`
训练数据集	逻辑谬误数据集
类别数量	13
学习率	2e - 6
批次大小	8（梯度累积，有效批次大小为16）
权重衰减	0.01
训练轮数	15
混合精度（FP16）	启用

支持的谬误类型

该模型可以对以下类型的逻辑谬误进行分类：

语义模糊谬误（Equivocation）
错误概括谬误（Faulty Generalization）
逻辑谬误（Fallacy of Logic）
诉诸大众谬误（Ad Populum）
循环论证谬误（Circular Reasoning）
假两难推理谬误（False Dilemma）
错误因果谬误（False Causality）
外延谬误（Fallacy of Extension）
可信度谬误（Fallacy of Credibility）
相关性谬误（Fallacy of Relevance）
故意谬误（Intentional）
诉诸情感谬误（Appeal to Emotion）
人身攻击谬误（Ad Hominem）

数据集

数据集名称：逻辑谬误分类数据集
来源：逻辑谬误分类数据集
类别数量：13种谬误（例如，人身攻击、诉诸情感、错误概括等）

应用场景

教育领域：通过识别常见谬误来教授逻辑推理和批判性思维。
论证分析：评估辩论、论文和文章中论点的有效性。
人工智能助手：为对话式人工智能系统增强批判性推理能力。
内容审核：识别在线辩论或社交媒体讨论中的逻辑缺陷。

💻 使用示例

基础用法

from transformers import pipeline

pipe = pipeline("text-classification", model="MidhunKanadan/roberta-large-fallacy-classification", device=0)
text = "The rooster crows always before the sun rises, therefore the crowing rooster causes the sun to rise."
result = pipe(text)[0]
print(f"Predicted Label: {result['label']}, Score: {result['score']:.4f}")

高级用法

以下代码可用于获取所有标签的预测分数：

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F

model_path = "MidhunKanadan/roberta-large-fallacy-classification"
text = "The rooster crows always before the sun rises, therefore the crowing rooster causes the sun to rise."

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path).to("cuda")
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128).to("cuda")

with torch.no_grad():
    probs = F.softmax(model(**inputs).logits, dim=-1)
    results = {model.config.id2label[i]: score.item() for i, score in enumerate(probs[0])}

# Print scores for all labels
for label, score in sorted(results.items(), key=lambda x: x[1], reverse=True):
    print(f"{label}: {score:.4f}")

预期输出：

false causality: 0.9632
fallacy of logic: 0.0139
faulty generalization: 0.0054
intentional: 0.0029
fallacy of credibility: 0.0023
equivocation: 0.0022
fallacy of extension: 0.0020
ad hominem: 0.0019
circular reasoning: 0.0016
false dilemma: 0.0015
fallacy of relevance: 0.0013
ad populum: 0.0009
appeal to emotion: 0.0009