HebEMO_surprise Open-Source Sentiment Detection Model - Accurately Identify the Sentiment Polarity of Modern Hebrew

Hebemo Surprise

Developed by avichr

HebEMO is a tool for detecting polarity and extracting emotions in modern Hebrew, trained on COVID-related datasets, and excels in polarity classification and emotion recognition tasks.

Text Classification

Transformers

#Hebrew Sentiment Analysis #Multi-emotion Recognition #News Comment Processing

Downloads 119

Release Time : 3/2/2022

Model Overview

HebEMO is a model specifically designed to analyze emotions and polarity in modern Hebrew user-generated content (UGC). It can identify eight basic emotions (anger, disgust, anticipation, fear, joy, sadness, surprise, and trust) as well as the overall sentiment polarity of the text (positive, negative, neutral).

Model Features

High-Performance Emotion Recognition

Achieves a weighted average F1 score of 0.96 in polarity classification tasks, and F1 scores of 0.78-0.97 for all emotions except surprise in emotion recognition.

Optimized Specifically for Hebrew

Trained on a unique modern Hebrew COVID-related dataset and specifically optimized for Hebrew user-generated content.

Multi-Emotion Dimensional Analysis

Capable of simultaneously identifying eight basic emotions (anger, disgust, anticipation, fear, joy, sadness, surprise, and trust).

AWS Cloud Deployment

The sentiment (polarity) analysis model is already deployed on AWS, facilitating cloud integration and usage.

Model Capabilities

Hebrew Text Sentiment Analysis

Multi-dimensional Emotion Recognition

Text Polarity Classification

User-Generated Content Analysis

Use Cases

Social Media Analysis

News Comment Sentiment Analysis

Analyze the sentiment tendencies and emotional expressions in user comments on news websites.

Accurately identifies negative emotions such as anger and disgust in comments, aiding content moderation.

Market Research

Product Feedback Analysis

Analyze Hebrew user evaluations and feedback on products or services.

Accurately classifies positive/negative feedback and identifies user emotions.

🚀 HebEMO - Emotion Recognition Model for Modern Hebrew

HebEMO is a powerful tool designed to detect polarity and extract emotions from modern Hebrew User-Generated Content (UGC). It was trained on a unique Covid-19 related dataset that we collected and annotated. The model achieved a high performance, with a weighted average F1-score of 0.96 for polarity classification. In emotion detection, it reached an F1-score ranging from 0.78 to 0.97, except for the surprise emotion, which the model had difficulty capturing (F1 = 0.41). These results outperform the best-reported performance, even when compared to English language models.

✨ Features

Polarity and Emotion Detection: HebEMO can accurately detect the polarity and extract emotions from modern Hebrew UGC.
High Performance: The model achieved excellent results in both polarity classification and emotion detection.
Unique Dataset: Trained on a unique Covid-19 related dataset collected and annotated by the authors.

📦 Installation

The installation steps can be found in the usage examples section. You need to clone the repository and install the required libraries.

!git clone https://github.com/avichaychriqui/HeBERT.git
!pip install pyplutchik==0.0.7
!pip install transformers==4.14.1

💻 Usage Examples

Basic Usage

# Clone the repository and import the necessary modules
!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()

# Analyze text from a file
HebEMO_model.hebemo(input_path = 'data/text_example.txt')
# return analyzed pandas.DataFrame  

# Analyze text directly and plot the results
hebEMO_df = HebEMO_model.hebemo(text='החיים יפים ומאושרים', plot=True)

Advanced Usage - Sentiment Classification

from transformers import AutoTokenizer, AutoModel, pipeline

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis") #same as 'avichr/heBERT' tokenizer
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")

# Create a sentiment analysis pipeline
sentiment_analysis = pipeline(
    "sentiment-analysis",
    model="avichr/heBERT_sentiment_analysis",
    tokenizer="avichr/heBERT_sentiment_analysis",
    return_all_scores = True
)

# Analyze text
sentiment_analysis('אני מתלבט מה לאכול לארוחת צהריים')	
sentiment_analysis('קפה זה טעים')
sentiment_analysis('אני לא אוהב את העולם')

📚 Documentation

Emotion UGC Data Description

Our UGC data consists of comments posted on news articles collected from 3 major Israeli news sites between January 2020 and August 2020. The total data size is approximately 150 MB, containing over 7 million words and 350K sentences.

Around 2000 sentences were annotated by crowd members (3 - 10 annotators per sentence) for overall sentiment (polarity) and eight emotions: anger, disgust, anticipation, fear, joy, sadness, surprise, and trust. The table below shows the percentage of sentences in which each emotion appeared.

Property	Details
Model Type	Emotion Recognition and Sentiment Analysis
Training Data	Comments from 3 major Israeli news sites (Jan 2020 - Aug 2020), ~150 MB, over 7 million words, 350K sentences

	anger	disgust	expectation	fear	happy	sadness	surprise	trust	sentiment
ratio	0.78	0.83	0.58	0.45	0.12	0.59	0.17	0.11	0.25

Performance

Emotion Recognition

emotion	f1-score	precision	recall
anger	0.96	0.99	0.93
disgust	0.97	0.98	0.96
anticipation	0.82	0.80	0.87
fear	0.79	0.88	0.72
joy	0.90	0.97	0.84
sadness	0.90	0.86	0.94
surprise	0.40	0.44	0.37
trust	0.83	0.86	0.80

The above metrics are for the positive class (meaning, the emotion is reflected in the text).

Sentiment (Polarity) Analysis

	precision	recall	f1-score
neutral	0.83	0.56	0.67
positive	0.96	0.92	0.94
negative	0.97	0.99	0.98
accuracy			0.97
macro avg	0.92	0.82	0.86
weighted avg	0.96	0.97	0.96

⚠️ Important Note

The sentiment (polarity) analysis model is also available on AWS! For more information, visit AWS' git

🔧 Technical Details

The model was trained on a unique Covid-19 related dataset, which gives it an edge in analyzing modern Hebrew UGC. The high performance in polarity classification and emotion detection is a result of the carefully designed training process and the quality of the annotated data.

📄 License

No license information is provided in the original document.

Contact Us

Avichay Chriqui
Inbal yahav
The Coller Semitic Languages AI Lab

Thank you, תודה, شكرا

Citation

If you used this model, please cite us as:

Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.

@article{chriqui2021hebert,
  title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
  author={Chriqui, Avihay and Yahav, Inbal},
  journal={INFORMS Journal on Data Science},
  year={2022}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご