đ HebEMO - Emotion Recognition Model for Modern Hebrew
HebEMO is a tool designed to detect polarity and extract emotions from modern Hebrew User-Generated Content (UGC). It was trained on a unique Covid-19 related dataset that we collected and annotated. The model achieved a high performance in polarity classification with a weighted average F1-score of 0.96. In emotion detection, it reached an F1-score ranging from 0.78 to 0.97, except for the surprise emotion, which the model had difficulty capturing (F1 = 0.41). These results outperform the best-reported performance, even when compared to models in the English language.
⨠Features
- Polarity and Emotion Detection: HebEMO can accurately detect the polarity (sentiment) and extract emotions from modern Hebrew UGC.
- High Performance: Demonstrates excellent performance in both polarity classification and emotion detection tasks.
- Unique Dataset: Trained on a specially collected and annotated Covid-19 related dataset.
đĻ Installation
The installation steps are included in the usage examples section. You need to install the following packages:
# !pip install pyplutchik==0.0.7
# !pip install transformers==4.14.1
And clone the repository:
!git clone https://github.com/avichaychriqui/HeBERT.git
đģ Usage Examples
Basic Usage
!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()
HebEMO_model.hebemo(input_path = 'data/text_example.txt')
hebEMO_df = HebEMO_model.hebemo(text='××××× ×פ×× ××××׊ר××', plot=True)

Advanced Usage - Sentiment Classification Model (Polarity ONLY)
from transformers import AutoTokenizer, AutoModel, pipeline
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis")
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
sentiment_analysis = pipeline(
"sentiment-analysis",
model="avichr/heBERT_sentiment_analysis",
tokenizer="avichr/heBERT_sentiment_analysis",
return_all_scores = True
)
sentiment_analysis('×× × ××Ē××× ×× ××××× ××ר×××Ē ×Ļ×ר×××')
sentiment_analysis('×§×¤× ×× ××ĸ××')
sentiment_analysis('×× × ×× ×××× ××Ē ××ĸ×××')
đ Documentation
Emotion UGC Data Description
Our UGC data consists of comments posted on news articles collected from 3 major Israeli news sites between January 2020 and August 2020. The total data size is approximately 150 MB, containing over 7 million words and 350K sentences. Around 2000 sentences were annotated by crowd members (3 - 10 annotators per sentence) for overall sentiment (polarity) and eight emotions: anger, disgust, anticipation, fear, joy, sadness, surprise, and trust. The table below shows the percentage of sentences in which each emotion appeared.
|
anger |
disgust |
expectation |
fear |
happy |
sadness |
surprise |
trust |
sentiment |
ratio |
0.78 |
0.83 |
0.58 |
0.45 |
0.12 |
0.59 |
0.17 |
0.11 |
0.25 |
Performance
Emotion Recognition
emotion |
f1-score |
precision |
recall |
anger |
0.96 |
0.99 |
0.93 |
disgust |
0.97 |
0.98 |
0.96 |
anticipation |
0.82 |
0.80 |
0.87 |
fear |
0.79 |
0.88 |
0.72 |
joy |
0.90 |
0.97 |
0.84 |
sadness |
0.90 |
0.86 |
0.94 |
surprise |
0.40 |
0.44 |
0.37 |
trust |
0.83 |
0.86 |
0.80 |
The above metrics are for the positive class (meaning, the emotion is reflected in the text).
Sentiment (Polarity) Analysis
|
precision |
recall |
f1-score |
neutral |
0.83 |
0.56 |
0.67 |
positive |
0.96 |
0.92 |
0.94 |
negative |
0.97 |
0.99 |
0.98 |
accuracy |
|
|
0.97 |
macro avg |
0.92 |
0.82 |
0.86 |
weighted avg |
0.96 |
0.97 |
0.96 |
The sentiment (polarity) analysis model is also available on AWS! For more information, visit AWS' git
đ License
No license information is provided in the original document.
Contact Us
Thank you, ×Ē×××, Ø´ŲØąØ§
Citation
If you used this model, please cite us as:
Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.
@article{chriqui2021hebert,
title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
author={Chriqui, Avihay and Yahav, Inbal},
journal={INFORMS Journal on Data Science},
year={2022}
}