herbert-base-cased-sentimentオープンソースモデル - ポーランド語テキストの感情のポジティブ、ネガティブ、ニュートラルを無料で分類する

ホーム

Herbert Base Cased Sentiment

Voicelabによって開発

HerBERTベースのポーランド語感情分析モデルで、ポーランド語テキストをポジティブ、ニュートラル、ネガティブの感情に分類可能

テキスト分類

Transformers

その他#ポーランド語感情分析 #ユーザーレビュー分類 #HerBERTアーキテクチャ

ダウンロード数 703

リリース時間 : 11/2/2022

モデル概要

このモデルはHerBERTアーキテクチャに基づくポーランド語感情分類器で、ポーランド語テキストの感情傾向を分析するために特別に設計されており、ネガティブ、ニュートラル、ポジティブの3つの感情分類をサポートしています。

モデル特徴

ポーランド語専用

ポーランド語に特化して最適化された感情分析モデル

3分類感情分析

ネガティブ、ニュートラル、ポジティブの3つの感情を正確に識別可能

HerBERTアーキテクチャ採用

ポーランド語事前学習モデルHerBERTをベースとしており、優れた言語理解能力を有する

モデル能力

ポーランド語テキスト分類

感情傾向分析

自然言語処理

使用事例

カスタマーサービス

顧客フィードバック分析

顧客評価やフィードバックの感情傾向を分析

不満のある顧客を特定し優先的に対応するのに役立つ

ソーシャルメディアモニタリング

ソーシャルメディア感情分析

ブランドに対するソーシャルメディア上の感情傾向を監視

ブランドに対する一般の態度変化をリアルタイムで把握

🚀 ポーランド語によるセンチメント分類

このプロジェクトは、ポーランド語のテキストに対するセンチメント分類を行うものです。Transformerモデルを利用して、ネガティブ、中立、ポジティブの3つのセンチメントを識別します。

🚀 クイックスタート

以下のコードを実行することで、ポーランド語のテキストに対するセンチメント分類を行うことができます。

import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification

id2label = {0: "negative", 1: "neutral", 2: "positive"}
tokenizer = AutoTokenizer.from_pretrained("Voicelab/herbert-base-cased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Voicelab/herbert-base-cased-sentiment")

input = ["Ale fajnie, spadł dzisiaj śnieg! Ulepimy dziś bałwana?"]

encoding = tokenizer(
          input,
          add_special_tokens=True,
          return_token_type_ids=True,
          truncation=True,
          padding='max_length',
          return_attention_mask=True,
          return_tensors='pt',
        )
output = model(**encoding).logits.to("cpu").detach().numpy()
prediction = id2label[np.argmax(output)]
print(input, "--->", prediction)

予測結果の例:

['Ale fajnie, spadł dzisiaj śnieg! Ulepimy dziś bałwana?'] ---> positive

💻 使用例

基本的な使用法

import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification

id2label = {0: "negative", 1: "neutral", 2: "positive"}
tokenizer = AutoTokenizer.from_pretrained("Voicelab/herbert-base-cased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Voicelab/herbert-base-cased-sentiment")

input = ["Ale fajnie, spadł dzisiaj śnieg! Ulepimy dziś bałwana?"]

encoding = tokenizer(
          input,
          add_special_tokens=True,
          return_token_type_ids=True,
          truncation=True,
          padding='max_length',
          return_attention_mask=True,
          return_tensors='pt',
        )
output = model(**encoding).logits.to("cpu").detach().numpy()
prediction = id2label[np.argmax(output)]
print(input, "--->", prediction)

高度な使用法

# 複数の入力文を一度に処理する場合
input_texts = ["Tekst pierwszy", "Tekst drugi", "Tekst trzeci"]
encodings = tokenizer(
    input_texts,
    add_special_tokens=True,
    return_token_type_ids=True,
    truncation=True,
    padding='max_length',
    return_attention_mask=True,
    return_tensors='pt'
)
outputs = model(**encodings).logits.to("cpu").detach().numpy()
predictions = [id2label[np.argmax(output)] for output in outputs]
for text, pred in zip(input_texts, predictions):
    print(text, "--->", pred)

📚 ドキュメント

概要

属性	详情
モデルタイプ	allegro/herbert-base-cased
訓練データ	レビュー + 独自データ
ブログ記事	Sentiment analysis - COVID-19 – the source of the heated discussion