đ HebEMO - Emotion Recognition Model for Modern Hebrew
HebEMO is a powerful tool designed to detect polarity and extract emotions from modern Hebrew User-Generated Content (UGC). It was trained on a unique Covid-19 related dataset that we collected and annotated. The model achieved a high performance, with a weighted average F1-score of 0.96 for polarity classification. In emotion detection, it reached an F1-score ranging from 0.78 to 0.97, except for the surprise emotion, which the model had difficulty capturing (F1 = 0.41). These results outperform the best-reported performance, even when compared to English language models.
⨠Features
- Polarity and Emotion Detection: HebEMO can accurately detect the polarity and extract emotions from modern Hebrew UGC.
- High Performance: The model achieved excellent results in both polarity classification and emotion detection.
- Unique Dataset: Trained on a unique Covid-19 related dataset collected and annotated by the authors.
đĻ Installation
The installation steps can be found in the usage examples section. You need to clone the repository and install the required libraries.
!git clone https://github.com/avichaychriqui/HeBERT.git
!pip install pyplutchik==0.0.7
!pip install transformers==4.14.1
đģ Usage Examples
Basic Usage
!git clone https://github.com/avichaychriqui/HeBERT.git
from HeBERT.src.HebEMO import *
HebEMO_model = HebEMO()
HebEMO_model.hebemo(input_path = 'data/text_example.txt')
hebEMO_df = HebEMO_model.hebemo(text='××××× ×פ×× ××××׊ר××', plot=True)
Advanced Usage - Sentiment Classification
from transformers import AutoTokenizer, AutoModel, pipeline
tokenizer = AutoTokenizer.from_pretrained("avichr/heBERT_sentiment_analysis")
model = AutoModel.from_pretrained("avichr/heBERT_sentiment_analysis")
sentiment_analysis = pipeline(
"sentiment-analysis",
model="avichr/heBERT_sentiment_analysis",
tokenizer="avichr/heBERT_sentiment_analysis",
return_all_scores = True
)
sentiment_analysis('×× × ××Ē××× ×× ××××× ××ר×××Ē ×Ļ×ר×××')
sentiment_analysis('×§×¤× ×× ××ĸ××')
sentiment_analysis('×× × ×× ×××× ××Ē ××ĸ×××')
đ Documentation
Emotion UGC Data Description
Our UGC data consists of comments posted on news articles collected from 3 major Israeli news sites between January 2020 and August 2020. The total data size is approximately 150 MB, containing over 7 million words and 350K sentences.
Around 2000 sentences were annotated by crowd members (3 - 10 annotators per sentence) for overall sentiment (polarity) and eight emotions: anger, disgust, anticipation, fear, joy, sadness, surprise, and trust. The table below shows the percentage of sentences in which each emotion appeared.
Property |
Details |
Model Type |
Emotion Recognition and Sentiment Analysis |
Training Data |
Comments from 3 major Israeli news sites (Jan 2020 - Aug 2020), ~150 MB, over 7 million words, 350K sentences |
|
anger |
disgust |
expectation |
fear |
happy |
sadness |
surprise |
trust |
sentiment |
ratio |
0.78 |
0.83 |
0.58 |
0.45 |
0.12 |
0.59 |
0.17 |
0.11 |
0.25 |
Performance
Emotion Recognition
emotion |
f1-score |
precision |
recall |
anger |
0.96 |
0.99 |
0.93 |
disgust |
0.97 |
0.98 |
0.96 |
anticipation |
0.82 |
0.80 |
0.87 |
fear |
0.79 |
0.88 |
0.72 |
joy |
0.90 |
0.97 |
0.84 |
sadness |
0.90 |
0.86 |
0.94 |
surprise |
0.40 |
0.44 |
0.37 |
trust |
0.83 |
0.86 |
0.80 |
The above metrics are for the positive class (meaning, the emotion is reflected in the text).
Sentiment (Polarity) Analysis
|
precision |
recall |
f1-score |
neutral |
0.83 |
0.56 |
0.67 |
positive |
0.96 |
0.92 |
0.94 |
negative |
0.97 |
0.99 |
0.98 |
accuracy |
|
|
0.97 |
macro avg |
0.92 |
0.82 |
0.86 |
weighted avg |
0.96 |
0.97 |
0.96 |
â ī¸ Important Note
The sentiment (polarity) analysis model is also available on AWS! For more information, visit AWS' git
đ§ Technical Details
The model was trained on a unique Covid-19 related dataset, which gives it an edge in analyzing modern Hebrew UGC. The high performance in polarity classification and emotion detection is a result of the carefully designed training process and the quality of the annotated data.
đ License
No license information is provided in the original document.
Contact Us
Thank you, ×Ē×××, Ø´ŲØąØ§
Citation
If you used this model, please cite us as:
Chriqui, A., & Yahav, I. (2022). HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition. INFORMS Journal on Data Science, forthcoming.
@article{chriqui2021hebert,
title={HeBERT \& HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition},
author={Chriqui, Avihay and Yahav, Inbal},
journal={INFORMS Journal on Data Science},
year={2022}
}