emotions-analyzer-bertオープンソース感情分析モデル - 28種類の感情の高精度識別をサポート

ホーム

Emotions Analyzer Bert

logasanjeevによって開発

BERT-base-uncasedアーキテクチャに基づいて微調整された多ラベル感情分類モデルで、28種類の感情識別をサポートします。

テキスト分類

Transformers

英語オープンソースライセンス:MIT #多ラベル感情分析 #Redditコメント処理 #Focal Loss最適化

ダウンロード数 3,764

リリース時間 : 4/12/2025

モデル概要

このモデルはGoEmotionsデータセットを基に訓練され、テキスト内の複数の感情を分析するために特別に設計されており、ソーシャルメディアのコメントなどのシーンの感情分析タスクに適しています。

モデル特徴

多ラベル感情分類

テキスト内の複数の感情を同時に識別でき、28種類の異なる感情ラベルをサポートします。

効率的な推論サポート

PyTorchとONNXの2種類の推論方式を提供し、異なるシーンのパフォーマンス要件を満たします。

最適化された閾値処理

最適化された分類閾値を使用して予測の精度を向上させます。

絵文字処理

テキスト内の絵文字を識別して処理し、感情特徴に変換することができます。

モデル能力

感情分析

多ラベル分類

テキスト前処理

絵文字識別

使用事例

ソーシャルメディア分析

Redditコメントの感情分析

Redditユーザーのコメントの感情傾向を分析します。

28種類の異なる感情、例えば喜び、怒り、悲しみなどを識別できます。

顧客フィードバック分析

製品評価の感情分析

顧客の製品またはサービスに対する評価の感情を分析します。

顧客の満足度と潜在的な問題を識別するのに役立ちます。

🚀 感情分析BERTモデル

このモデルは、BERT-base-uncased を GoEmotions データセットでファインチューニングし、28種類の感情を持つマルチラベル分類タスクに対応させたものです。更新版では、Macro F1の向上、効率的な推論のためのONNXサポート、および解釈性向上のための可視化機能が追加されています。

📚 モデル詳細

属性	詳情
アーキテクチャ	BERT-base-uncased (110Mパラメータ)
訓練データ	GoEmotions (58k件のRedditコメント、28種類の感情)
損失関数	Focal Loss (alpha=1, gamma=2)
オプティマイザ	AdamW (lr=2e-5, weight_decay=0.01)
エポック数	5
バッチサイズ	16
最大長	128
ハードウェア	Kaggle P100 GPU (16GB)

🚀 クイックスタート

最適化された閾値を用いた正確な予測を行うには、Gradioデモを使用してください。デモでは、前処理されたテキストと上位5つの予測感情が表示され、閾値に基づく予測も行われます。以下は予測の例です。

入力: "Iâ€™m thrilled to win this award! ðŸ˜„"
- 出力: excitement: 0.5836, joy: 0.5290
入力: "This is so frustrating, nothing works. ðŸ˜£"
- 出力: annoyance: 0.6147, anger: 0.4669
入力: "I feel so sorry for what happened. ðŸ˜¢"
- 出力: sadness: 0.5321, remorse: 0.9107

📊 パフォーマンス

Micro F1: 0.6006 (最適化された閾値)
Macro F1: 0.5390
Precision: 0.5371
Recall: 0.6812
Hamming Loss: 0.0377
平均正の予測数: 1.4789

クラスごとの精度、適合率、再現率、F1、MCC、サポート、および閾値を含む詳細な評価と可視化については、Kaggleノートブックを参照してください。

クラスごとのパフォーマンス

最適化された閾値 ( optimized_thresholds.json を参照) を使用したテストセットのクラスごとのメトリクスは以下の通りです。

感情	精度	適合率	再現率	F1スコア	MCC	サポート	閾値
admiration	0.9410	0.6649	0.7361	0.6987	0.6672	504	0.4500
amusement	0.9801	0.7635	0.8561	0.8071	0.7981	264	0.4500
anger	0.9694	0.6176	0.4242	0.5030	0.4970	198	0.4500
annoyance	0.9121	0.3297	0.4750	0.3892	0.3502	320	0.3500
approval	0.8843	0.2966	0.5755	0.3915	0.3572	351	0.3500
caring	0.9759	0.5196	0.3926	0.4473	0.4396	135	0.4500
confusion	0.9711	0.4861	0.4575	0.4714	0.4567	153	0.4500
curiosity	0.9368	0.4442	0.8275	0.5781	0.5783	284	0.4000
desire	0.9865	0.5714	0.4819	0.5229	0.5180	83	0.4000
disappointment	0.9565	0.2906	0.3907	0.3333	0.3150	151	0.3500
disapproval	0.9235	0.3405	0.5918	0.4323	0.4118	267	0.3500
disgust	0.9810	0.6250	0.4065	0.4926	0.4950	123	0.5500
embarrassment	0.9947	0.7000	0.3784	0.4912	0.5123	37	0.5000
excitement	0.9790	0.4486	0.4660	0.4571	0.4465	103	0.4000
fear	0.9836	0.4599	0.8077	0.5860	0.6023	78	0.3000
gratitude	0.9888	0.9450	0.8778	0.9102	0.9049	352	0.5500
grief	0.9985	0.3333	0.3333	0.3333	0.3326	6	0.3000
joy	0.9768	0.6061	0.6211	0.6135	0.6016	161	0.4500
love	0.9825	0.7826	0.8319	0.8065	0.7978	238	0.5000
nervousness	0.9952	0.4348	0.4348	0.4348	0.4324	23	0.4000
optimism	0.9689	0.5436	0.5699	0.5564	0.5405	186	0.4000
pride	0.9980	0.8571	0.3750	0.5217	0.5662	16	0.4000
realization	0.9737	0.5217	0.1655	0.2513	0.2838	145	0.4500
relief	0.9982	0.5385	0.6364	0.5833	0.5845	11	0.3000
remorse	0.9912	0.5426	0.9107	0.6800	0.6992	56	0.3500
sadness	0.9757	0.5845	0.5321	0.5570	0.5452	156	0.4500
surprise	0.9724	0.4772	0.6667	0.5562	0.5504	141	0.3500
neutral	0.7485	0.5821	0.8372	0.6867	0.5102	1787	0.4000

可視化

クラスごとのF1スコア

Class-Wise F1 Scores

訓練曲線

Training and Validation Loss and Micro F1

📈 訓練の洞察

このモデルは、クラス不均衡を処理するためにFocal Lossを使用して5エポック訓練されました。訓練と検証の曲線は一貫した改善を示しています。

訓練損失は0.0429から0.0134に減少しました。
検証Micro F1はエポック5で最大の0.5874に達しました。
詳細は上記の訓練曲線プロットを参照してください。

💻 使用例

inference.pyを使った迅速な推論 (PyTorch推奨)

PyTorchでこのモデルを使用する最も簡単な方法は、リポジトリから inference.py をプログラムで取得して使用することです。このスクリプトは、前処理、モデルの読み込み、および推論をすべて処理します。

プログラムによるダウンロードと推論

以下のPythonスクリプトを実行して、inference.py をダウンロードし、予測を行います。

!pip install transformers torch huggingface_hub emoji -q

import shutil
import os
from huggingface_hub import hf_hub_download
from importlib import import_module

repo_id = "logasanjeev/emotions-analyzer-bert"
local_file = hf_hub_download(repo_id=repo_id, filename="inference.py")

current_dir = os.getcwd()
destination = os.path.join(current_dir, "inference.py")
shutil.copy(local_file, destination)

inference_module = import_module("inference")
predict_emotions = inference_module.predict_emotions

text = "Iâ€™m thrilled to win this award! ðŸ˜„"
result, processed = predict_emotions(text)
print(f"Input: {text}")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

期待される出力:

Input: Iâ€™m thrilled to win this award! ðŸ˜„
Processed: iâ€™m thrilled to win this award ! grinning_face_with_smiling_eyes
Predicted Emotions:
excitement: 0.5836
joy: 0.5290

代替方法: 手動ダウンロード

inference.py を手動でダウンロードする場合は、以下の手順を実行します。

必要な依存関係をインストールします。

pip install transformers torch huggingface_hub emoji

リポジトリから inference.py をダウンロードします。
Pythonまたはコマンドラインで使用します。

Pythonの例:

from inference import predict_emotions

result, processed = predict_emotions("Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Input: Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

コマンドラインの例:

python inference.py "Iâ€™m thrilled to win this award! ðŸ˜„"

onnx_inference.pyを使った迅速な推論 (ONNX推奨)

ONNXを使用してより高速で効率的な推論を行うには、onnx_inference.py を使用できます。このスクリプトは、通常PyTorchよりも軽量なONNX Runtimeを使用して推論を行います。

プログラムによるダウンロードと推論

以下のPythonスクリプトを実行して、onnx_inference.py をダウンロードし、予測を行います。

!pip install transformers onnxruntime huggingface_hub emoji numpy -q

import shutil
import os
from huggingface_hub import hf_hub_download
from importlib import import_module

repo_id = "logasanjeev/emotions-analyzer-bert"
local_file = hf_hub_download(repo_id=repo_id, filename="onnx_inference.py")

current_dir = os.getcwd()
destination = os.path.join(current_dir, "onnx_inference.py")
shutil.copy(local_file, destination)

onnx_inference_module = import_module("onnx_inference")
predict_emotions = onnx_inference_module.predict_emotions

text = "Iâ€™m thrilled to win this award! ðŸ˜„"
result, processed = predict_emotions(text)
print(f"Input: {text}")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

期待される出力:

Input: Iâ€™m thrilled to win this award! ðŸ˜„
Processed: iâ€™m thrilled to win this award ! grinning_face_with_smiling_eyes
Predicted Emotions:
excitement: 0.5836
joy: 0.5290

代替方法: 手動ダウンロード

onnx_inference.py を手動でダウンロードする場合は、以下の手順を実行します。

必要な依存関係をインストールします。

pip install transformers onnxruntime huggingface_hub emoji numpy

リポジトリから onnx_inference.py をダウンロードします。
Pythonまたはコマンドラインで使用します。

Pythonの例:

from onnx_inference import predict_emotions

result, processed = predict_emotions("Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Input: Iâ€™m thrilled to win this award! ðŸ˜„")
print(f"Processed: {processed}")
print("Predicted Emotions:")
print(result)

コマンドラインの例:

python onnx_inference.py "Iâ€™m thrilled to win this award! ðŸ˜„"

前処理

推論の前に、テキストを訓練時の条件に合わせて前処理する必要があります。

ユーザーメンション (u/username) を [USER] に置き換えます。
サブレディット (r/subreddit) を [SUBREDDIT] に置き換えます。
URLを [URL] に置き換えます。
絵文字を emoji.demojize を使用してテキストに変換します (例: ðŸ˜Š → smiling_face_with_smiling_eyes)。
テキストを小文字に変換します。

PyTorchによる推論

from transformers import BertForSequenceClassification, BertTokenizer
import torch
import json
import requests
import re
import emoji

def preprocess_text(text):
    text = re.sub(r'u/\w+', '[USER]', text)
    text = re.sub(r'r/\w+', '[SUBREDDIT]', text)
    text = re.sub(r'http[s]?://\S+', '[URL]', text)
    text = emoji.demojize(text, delimiters=(" ", " "))
    text = text.lower()
    return text

repo_id = "logasanjeev/emotions-analyzer-bert"
model = BertForSequenceClassification.from_pretrained(repo_id)
tokenizer = BertTokenizer.from_pretrained(repo_id)

thresholds_url = f"https://huggingface.co/{repo_id}/raw/main/optimized_thresholds.json"
thresholds_data = json.loads(requests.get(thresholds_url).text)
emotion_labels = thresholds_data["emotion_labels"]
thresholds = thresholds_data["thresholds"]

text = "Iâ€™m just chilling today."
processed_text = preprocess_text(text)
encodings = tokenizer(processed_text, padding='max_length', truncation=True, max_length=128, return_tensors='pt')
with torch.no_grad():
    logits = torch.sigmoid(model(**encodings).logits).numpy()[0]
predictions = [(emotion_labels[i], round(logit, 4)) for i, (logit, thresh) in enumerate(zip(logits, thresholds)) if logit >= thresh]
predictions = sorted(predictions, key=lambda x: x[1], reverse=True)
print(predictions)
# Output: [('neutral', 0.8147)]

ONNXによる推論

簡略化されたONNX推論を行うには、上記の onnx_inference.py を使用してください。または、以下の手動のアプローチを使用することもできます。

import onnxruntime as ort
import numpy as np

onnx_url = f"https://huggingface.co/{repo_id}/raw/main/model.onnx"
with open("model.onnx", "wb") as f:
    f.write(requests.get(onnx_url).content)

text = "Iâ€™m thrilled to win this award! ðŸ˜„"
processed_text = preprocess_text(text)
encodings = tokenizer(processed_text, padding='max_length', truncation=True, max_length=128, return_tensors='np')
session = ort.InferenceSession("model.onnx")
inputs = {
    'input_ids': encodings['input_ids'].astype(np.int64),
    'attention_mask': encodings['attention_mask'].astype(np.int64)
}
logits = session.run(None, inputs)[0][0]
logits = 1 / (1 + np.exp(-logits))  # Sigmoid
predictions = [(emotion_labels[i], round(logit, 4)) for i, (logit, thresh) in enumerate(zip(logits, thresholds)) if logit >= thresh]
predictions = sorted(predictions, key=lambda x: x[1], reverse=True)
print(predictions)
# Output: [('excitement', 0.5836), ('joy', 0.5290)]