ruRoberta-large-ru-go-emotions開源模型 - 精準檢測俄語文本27種情緒類型

首頁

Ruroberta Large Ru Go Emotions

由fyaronskiy開發

基於ruRoberta-large微調的多標籤情緒分類模型，可檢測俄語文本中的27種情緒類型，是目前俄語開源模型中性能最優的情緒檢測模型。

文本分類

Transformers

其他開源協議:MIT #俄語多情緒檢測 #高精度情感分析 #多標籤分類

下載量 813

發布時間 : 8/19/2024

模型概述

該模型在ru_go_emotions數據集上微調，專門用於俄語文本的多標籤情緒分類任務，能夠識別包括欽佩、憤怒、快樂等27種情緒類型。

模型特點

多標籤情緒分類

支持同時檢測文本中的多種情緒，而非單一情緒分類

最優閾值優化

通過驗證集優化每個情緒類別的獨立閾值，最大化F1宏平均得分

高性能表現

在俄語情緒檢測任務中達到當前開源模型的最佳性能（F1宏平均0.48）

ONNX支持

提供ONNX和INT8量化版本，推理速度最高可提升2.5倍

模型能力

俄語文本情緒分析

多標籤情緒檢測

情緒概率預測

情緒強度評估

使用案例

社交媒體分析

用戶評論情緒分析

分析社交媒體上用戶評論的情緒傾向

可識別出評論中的主要情緒如快樂、憤怒等

客戶服務

客戶反饋情緒檢測

自動分析客戶反饋中的情緒狀態

幫助識別不滿客戶（憤怒、失望）和滿意客戶（感激、快樂）

🚀 ruRoberta-large-ru-go-emotions 情感分類模型

這是一款強大的俄語開源模型，能夠精準檢測文本中的 27 種情感類型，為文本情感分析提供了高效且準確的解決方案。

🚀 快速開始

本模型是基於 ruRoberta-large 在 ru_go_emotions 數據集上微調得到的多標籤分類模型。可用於從文本中提取所有情感或檢測特定情感。閾值是在驗證集上通過最大化所有標籤的 F1 宏得分來選擇的。

✨ 主要特性

多情感檢測：能夠檢測 27 種不同的情感類型，覆蓋範圍廣泛。
多版本支持：提供了完整精度的 ONNX 模型和 INT8 量化模型，在保證性能的同時提升了推理速度。
性能優異：在多個指標上表現出色，如 F1 宏得分、精度和召回率等。

📦 安裝指南

使用該模型，你需要安裝 transformers 庫，可以使用以下命令進行安裝：

pip install transformers

💻 使用示例

基礎用法

以下是使用 Huggingface Transformers 庫調用模型的基礎代碼示例：

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("fyaronskiy/ruRoberta-large-ru-go-emotions")
model = AutoModelForSequenceClassification.from_pretrained("fyaronskiy/ruRoberta-large-ru-go-emotions")

best_thresholds = [0.36734693877551017, 0.2857142857142857, 0.2857142857142857, 0.16326530612244897, 0.14285714285714285, 0.14285714285714285, 0.18367346938775508, 0.3469387755102041, 0.32653061224489793, 0.22448979591836732, 0.2040816326530612, 0.2857142857142857, 0.18367346938775508, 0.2857142857142857, 0.24489795918367346, 0.7142857142857142, 0.02040816326530612, 0.3061224489795918, 0.44897959183673464, 0.061224489795918366, 0.18367346938775508, 0.04081632653061224, 0.08163265306122448, 0.1020408163265306, 0.22448979591836732, 0.3877551020408163, 0.3469387755102041, 0.24489795918367346]
LABELS = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
ID2LABEL = dict(enumerate(LABELS))

高級用法

提取文本中的情感

def predict_emotions(text):
  inputs = tokenizer(text, truncation=True, add_special_tokens=True, max_length=128, return_tensors='pt')
  with torch.no_grad():
      logits = model(**inputs).logits
  probas = torch.sigmoid(logits).squeeze(dim=0)  
  class_binary_labels = (probas > torch.tensor(best_thresholds)).int()
  return [ID2LABEL[label_id] for label_id, value in enumerate(class_binary_labels) if value == 1]

print(predict_emotions('У вас отличный сервис и лучший кофе в городе, обожаю вашу кофейню!'))

#['admiration', 'love']

獲取所有情感及其得分

def predict(text):
    inputs = tokenizer(text, truncation=True, add_special_tokens=True, max_length=128, return_tensors='pt')
    with torch.no_grad():
        logits = model(**inputs).logits
    probas = torch.sigmoid(logits).squeeze(dim=0).tolist()
    probas = [round(proba, 3) for proba in probas]    
    
    labels2probas = dict(zip(LABELS, probas))
    probas_dict_sorted = dict(sorted(labels2probas.items(), key=lambda x: x[1], reverse=True))
    return probas_dict_sorted

print(predict('У вас отличный сервис и лучший кофе в городе, обожаю вашу кофейню!'))
'''{'admiration': 0.81,
 'love': 0.538,
 'joy': 0.041,
 'gratitude': 0.031,
 'approval': 0.026,
 'excitement': 0.023,
 'neutral': 0.009,
 'curiosity': 0.006,
 'amusement': 0.005,
 'desire': 0.005,
 'realization': 0.005,
 'caring': 0.004,
 'confusion': 0.004,
 'surprise': 0.004,
 'disappointment': 0.003,
 'disapproval': 0.003,
 'anger': 0.002,
 'annoyance': 0.002,
 'disgust': 0.002,
 'fear': 0.002,
 'grief': 0.002,
 'optimism': 0.002,
 'pride': 0.002,
 'relief': 0.002,
 'sadness': 0.002,
 'embarrassment': 0.001,
 'nervousness': 0.001,
 'remorse': 0.001}
'''

📚 詳細文檔

模型評估結果

在 ru-go-emotions 測試集上的評估結果如下：

情感類型	精度	召回率	F1 得分	樣本數	閾值
admiration	0.63	0.75	0.69	504	0.37
amusement	0.76	0.91	0.83	264	0.29
anger	0.47	0.32	0.38	198	0.29
annoyance	0.33	0.39	0.36	320	0.16
approval	0.27	0.58	0.37	351	0.14
caring	0.32	0.59	0.41	135	0.14
confusion	0.41	0.52	0.46	153	0.18
curiosity	0.45	0.73	0.55	284	0.35
desire	0.54	0.31	0.40	83	0.33
disappointment	0.31	0.34	0.33	151	0.22
disapproval	0.31	0.57	0.40	267	0.20
disgust	0.44	0.40	0.42	123	0.29
embarrassment	0.48	0.38	0.42	37	0.18
excitement	0.29	0.43	0.34	103	0.29
fear	0.56	0.78	0.65	78	0.24
gratitude	0.95	0.85	0.89	352	0.71
grief	0.03	0.33	0.05	6	0.02
joy	0.48	0.58	0.53	161	0.31
love	0.73	0.84	0.78	238	0.45
nervousness	0.24	0.48	0.32	23	0.06
optimism	0.57	0.54	0.56	186	0.18
pride	0.67	0.38	0.48	16	0.04
realization	0.18	0.31	0.23	145	0.08
relief	0.30	0.27	0.29	11	0.10
remorse	0.53	0.84	0.65	56	0.22
sadness	0.56	0.53	0.55	156	0.39
surprise	0.55	0.57	0.56	141	0.35
neutral	0.59	0.79	0.68	1787	0.24
micro avg	0.50	0.66	0.57	6329
macro avg	0.46	0.55	0.48	6329
weighted avg	0.53	0.66	0.58	6329

ONNX 和量化版本模型

完整精度 ONNX 模型（onnx/model.onnx）：比 Transformer 模型快 1.5 倍，性能相同。
INT8 量化模型（onnx/model_quantized.onnx）：比 Transformer 模型快 2.5 倍，性能幾乎相同。

以下是在測試集上對 5427 個樣本進行推理測試的結果：

模型	大小	F1 宏得分	加速比	推理時間
原始模型	1.4 GB	0.48	1x	44 分 55 秒
onnx.model	1.4 GB	0.48	1.5x	29 分 52 秒
model_quantized.onnx	0.36 GB	0.48	2.5x	18 分 10 秒

使用 ONNX 版本模型

加載完整精度模型

from optimum.onnxruntime import ORTModelForSequenceClassification
model_id = "fyaronskiy/ruRoberta-large-ru-go-emotions"
file_name = "onnx/model.onnx"
model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name)
tokenizer = AutoTokenizer.from_pretrained(model_id)

加載 INT8 量化模型

model_id = "fyaronskiy/ruRoberta-large-ru-go-emotions"
file_name = "onnx/model_quantized.onnx"

model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name)
tokenizer = AutoTokenizer.from_pretrained(model_id)

加載模型後，使用 ONNX 模型進行推理的方式與普通 Transformer 模型相同：

best_thresholds = [0.36734693877551017, 0.2857142857142857, 0.2857142857142857, 0.16326530612244897, 0.14285714285714285, 0.14285714285714285, 0.18367346938775508, 0.3469387755102041, 0.32653061224489793, 0.22448979591836732, 0.2040816326530612, 0.2857142857142857, 0.18367346938775508, 0.2857142857142857, 0.24489795918367346, 0.7142857142857142, 0.02040816326530612, 0.3061224489795918, 0.44897959183673464, 0.061224489795918366, 0.18367346938775508, 0.04081632653061224, 0.08163265306122448, 0.1020408163265306, 0.22448979591836732, 0.3877551020408163, 0.3469387755102041, 0.24489795918367346]
LABELS = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
ID2LABEL = dict(enumerate(LABELS))

def predict_emotions(text):
  inputs = tokenizer(text, truncation=True, add_special_tokens=True, max_length=128, return_tensors='pt')
  with torch.no_grad():
      logits = model(**inputs).logits
  probas = torch.sigmoid(logits).squeeze(dim=0)  
  class_binary_labels = (probas > torch.tensor(best_thresholds)).int()
  return [ID2LABEL[label_id] for label_id, value in enumerate(class_binary_labels) if value == 1]

print(predict_emotions('У вас отличный сервис и лучший кофе в городе, обожаю вашу кофейню!'))
#['admiration', 'love']