🚀 rubert_tiny2_russian_emotion_sentiment
rubert_tiny2_russian_emotion_sentiment
模型是轻量级模型 cointegrated/rubert-tiny2
的微调版本,用于对俄语消息中的五种情绪进行分类,能够有效识别文本中的情绪倾向,为俄语情感分析提供了有力支持。
🚀 快速开始
安装依赖
pip install transformers torch
使用示例代码
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = [
"Сегодня отличный день!",
"Меня это всё бесит и раздражает."
]
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
id2label = model.config.id2label
labels = [id2label[p] for p in preds]
print(labels)
✨ 主要特性
该模型能够对俄语消息进行五种情绪的分类:
- 0:aggression( aggression)
- 1:anxiety( anxiety)
- 2:neutral( neutral)
- 3:positive( positive)
- 4:sarcasm( sarcasm)
📦 安装指南
pip install transformers torch
💻 使用示例
基础用法
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
MODEL_ID = "Kostya165/rubert_tiny2_russian_emotion_sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()
texts = [
"Сегодня отличный день!",
"Меня это всё бесит и раздражает."
]
enc = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**enc).logits
preds = logits.argmax(dim=-1).tolist()
id2label = model.config.id2label
labels = [id2label[p] for p in preds]
print(labels)
📚 详细文档
验证结果
指标 |
值 |
Accuracy |
0.8911 |
F1 macro |
0.8910 |
F1 micro |
0.8911 |
各类别准确率:
- aggression (0): 0.9120
- anxiety (1): 0.9462
- neutral (2): 0.8663
- positive (3): 0.8884
- sarcasm (4): 0.8426
训练详情
- 基础模型:
cointegrated/rubert-tiny2
- 数据集:
Kostya165/ru_emotion_dvach
- 训练轮数:2
- 批次大小:32
- 学习率:1e-5
- 混合精度:FP16
- 正则化:Dropout 0.1,weight_decay 0.01,warmup_ratio 0.1
依赖项
transformers>=4.30.0
torch>=1.10.0
datasets
evaluate
🔧 技术细节
该模型基于 cointegrated/rubert-tiny2
进行微调,使用 Kostya165/ru_emotion_dvach
数据集进行训练。训练过程中采用了 2 轮训练,批次大小为 32,学习率为 1e-5,混合精度为 FP16,并使用了 Dropout 0.1、weight_decay 0.01 和 warmup_ratio 0.1 进行正则化。在验证集上取得了较好的分类效果,各类别准确率均较高。
📄 许可证
CC-BY-SA 4.0。
引用
@article{rubert_tiny2_russian_emotion_sentiment,
title = {Russian Emotion Sentiment Classification with RuBERT-tiny2},
author = {Kostya165},
year = {2024},
howpublished = {\url{https://huggingface.co/Kostya165/rubert_tiny2_russian_emotion_sentiment}}
}