đ bert-base-finance-sentiment-noisy-search
This model is a fine - tuned version of BERT for finance news sentiment analysis, achieving high accuracy through data enhancement with noisy search.
đ Quick Start
You can use this model for finance news sentiment analysis with 3 options: "Positive", "Neutral" and "Negative". To get the best results, feed the classifier with the title and either the 1st paragraph or a short news summarization e.g. of up to 64 tokens.
⨠Features
- Fine - tuned from [bert - base - uncased](https://huggingface.co/bert - base - uncased) on Kaggle finance news sentiment analysis dataset.
- Utilizes noisy search for data enhancement, significantly boosting the model's accuracy from about 88% to over 95%.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
Advanced Usage
đ Documentation
This model is a fine - tuned version of [bert - base - uncased](https://huggingface.co/bert - base - uncased) on Kaggle finance news sentiment analysis with data enhancement using noisy search. The process is as follows:
- First, "bert - base - uncased" was fine - tuned on Kaggle's finance news sentiment analysis https://www.kaggle.com/ankurzing/sentiment - analysis - for - financial - news dataset, achieving an accuracy of about 88%.
- A logistic - regression classifier was then used on the same data. By inspecting only bi - grams, we looked at the coefficients that contributed the most to the "Positive" and "Negative" classes.
- Using the top 25 bi - grams per class (i.e., "Positive" / "Negative"), we invoked Bing news search with those bi - grams and retrieved up to 50 news items per bi - gram phrase.
- We called it "noisy - search" because it is assumed that positive bi - grams (e.g., "profit rose", "growth net") give rise to positive examples, whereas negative bi - grams (e.g., "loss increase", "share loss") result in negative examples. Note that we didn't test for the validity of this assumption (hence: noisy - search).
- For each article, we kept the title + excerpt and labeled it according to pre - assumptions on class associations.
- We then trained the same model on the noisy data and applied it to a held - out test set from the original data set split.
- Training with a couple of thousands of noisy "positives" and "negatives" examples yielded a test - set accuracy of about 95%.
- It shows that by automatically collecting noisy examples using search, we can boost the accuracy performance from about 88% to more than 95%.
Accuracy results for Logistic Regression (LR) and BERT (base - cased) are shown in the attached pdf:
https://drive.google.com/file/d/1MI9gRdppactVZ_XvhCwvoaOV1aRfprrd/view?usp = sharing
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e - 05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 5
Framework versions
- Transformers 4.16.2
- Pytorch 1.10.0+cu111
- Datasets 1.18.3
- Tokenizers 0.11.0
đ§ Technical Details
The model fine - tunes "bert - base - uncased" on Kaggle's finance news sentiment analysis dataset. After the initial fine - tuning, a logistic - regression classifier is used to analyze bi - grams. Then, "noisy - search" is employed to collect additional data. The model is retrained on the noisy data, and the results show a significant improvement in accuracy.
đ License
This model is licensed under the Apache - 2.0 license.
Property |
Details |
Model Type |
bert - base - finance - sentiment - noisy - search |
Training Data |
Kaggle finance news sentiment analysis dataset https://www.kaggle.com/ankurzing/sentiment - analysis - for - financial - news and data collected through noisy - search |