roberta-base-imdb开源情感分析模型 - 免费快速判断文本情感倾向

首页

Roberta Base Imdb

由 aychang 开发

基于roBERTa架构的基础模型，在IMDB数据集上训练完成，用于情感分析任务。

文本分类英语开源协议:MIT #IMDB情感分析 #高准确率(94.6%)#电影评论分类

下载量 341

发布时间 : 3/2/2022

模型简介

该模型是一个基于roBERTa架构的情感分析模型，专门针对IMDB电影评论数据集进行训练，能够准确判断文本的情感倾向（正面/负面）。

模型特点

高准确率

在IMDB测试集上达到94.67%的准确率

平衡性能

F1值显示模型在正负样本上表现均衡（正面0.947，负面0.946）

高效推理

评估速度达到每秒102个样本

模型能力

文本情感分析

电影评论分类

英语文本处理

使用案例

影视评论分析

电影评论情感分类

自动判断用户对电影的评价是正面还是负面

准确率94.67%

产品反馈分析

用户评价情感分析

分析用户对产品的评价情感倾向

🚀 IMDB情感任务：roberta-base

本项目基于roBERTa基础模型，在IMDB数据集上进行训练，用于文本情感分类任务，能有效判断文本的情感倾向。

🚀 快速开始

模型描述

一个简单的roBERTa基础模型，在“imdb”数据集上进行了训练。

预期用途与限制

使用方法

Transformers

# Load model and tokenizer
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Use pipeline
from transformers import pipeline

model_name = "aychang/roberta-base-imdb"

nlp = pipeline("sentiment-analysis", model=model_name, tokenizer=model_name)

results = nlp(["I didn't really like it because it was so terrible.", "I love how easy it is to watch and get good results."])

AdaptNLP

from adaptnlp import EasySequenceClassifier

model_name = "aychang/roberta-base-imdb"
texts = ["I didn't really like it because it was so terrible.", "I love how easy it is to watch and get good results."]

classifer = EasySequenceClassifier
results = classifier.tag_text(text=texts, model_name_or_path=model_name, mini_batch_size=2)

局限性和偏差

这是一个在基准数据集上训练的基础语言模型。

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

# Load model and tokenizer
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Use pipeline
from transformers import pipeline

model_name = "aychang/roberta-base-imdb"

nlp = pipeline("sentiment-analysis", model=model_name, tokenizer=model_name)

results = nlp(["I didn't really like it because it was so terrible.", "I love how easy it is to watch and get good results."])

高级用法

from adaptnlp import EasySequenceClassifier

model_name = "aychang/roberta-base-imdb"
texts = ["I didn't really like it because it was so terrible.", "I love how easy it is to watch and get good results."]

classifer = EasySequenceClassifier
results = classifier.tag_text(text=texts, model_name_or_path=model_name, mini_batch_size=2)

📚 详细文档

训练数据

IMDB https://huggingface.co/datasets/imdb

训练过程

硬件

One V100

超参数和训练参数

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='./models',
    overwrite_output_dir=False,
    num_train_epochs=2,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=500,
    weight_decay=0.01,
    evaluation_strategy="steps",
    logging_dir='./logs',
    fp16=False,
    eval_steps=800,
    save_steps=300000
)

评估结果

{'epoch': 2.0,
 'eval_accuracy': 0.94668,
 'eval_f1': array([0.94603457, 0.94731017]),
 'eval_loss': 0.2578844428062439,
 'eval_precision': array([0.95762642, 0.93624502]),
 'eval_recall': array([0.93472, 0.95864]),
 'eval_runtime': 244.7522,
 'eval_samples_per_second': 102.144}