Bert-base-finance-sentiment-noisy-search Open-source Model - Accurately Analyze Sentiment Classification of Financial News

Bert Base Finance Sentiment Noisy Search

Developed by oferweintraub

BERT-based financial sentiment analysis model trained with noisy search augmentation, suitable for sentiment classification of financial news

Text Classification

Transformers

Open Source License:Apache-2.0 #Financial Sentiment Analysis #Noisy Search Augmentation #BERT Fine-tuning

Downloads 15

Release Time : 3/2/2022

Model Overview

This model is fine-tuned from bert-base-uncased on the Kaggle financial news sentiment analysis dataset, enhanced with noisy search augmentation training. It can classify financial news into 'positive', 'neutral', and 'negative' sentiment categories.

Model Features

Noisy Search Augmentation Training

Automatically collected noisy search samples enhance training data, significantly improving model performance

Financial Domain Optimization

Specifically optimized for financial news sentiment analysis tasks

Performance Improvement

Accuracy improved from 88% to over 95% through noisy data training

Model Capabilities

Financial text sentiment analysis

News headline sentiment classification

Financial summary sentiment judgment

Use Cases

Financial Analysis

Earnings Report Sentiment Analysis

Analyze the sentiment tendency of corporate earnings news

Can accurately determine positive, neutral, or negative sentiment in earnings news

Market Sentiment Monitoring

Monitor overall sentiment changes in financial market news

Can be used to construct market sentiment indices

News Analysis

Financial News Classification

Sentiment classification of financial news

Approximately 95% accuracy

🚀 bert-base-finance-sentiment-noisy-search

This model is a fine - tuned version of BERT for finance news sentiment analysis, achieving high accuracy through data enhancement with noisy search.

🚀 Quick Start

You can use this model for finance news sentiment analysis with 3 options: "Positive", "Neutral" and "Negative". To get the best results, feed the classifier with the title and either the 1st paragraph or a short news summarization e.g. of up to 64 tokens.

✨ Features

Fine - tuned from [bert - base - uncased](https://huggingface.co/bert - base - uncased) on Kaggle finance news sentiment analysis dataset.
Utilizes noisy search for data enhancement, significantly boosting the model's accuracy from about 88% to over 95%.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

# No code example provided in the original document

Advanced Usage

# No code example provided in the original document

📚 Documentation

This model is a fine - tuned version of [bert - base - uncased](https://huggingface.co/bert - base - uncased) on Kaggle finance news sentiment analysis with data enhancement using noisy search. The process is as follows:

First, "bert - base - uncased" was fine - tuned on Kaggle's finance news sentiment analysis https://www.kaggle.com/ankurzing/sentiment - analysis - for - financial - news dataset, achieving an accuracy of about 88%.
A logistic - regression classifier was then used on the same data. By inspecting only bi - grams, we looked at the coefficients that contributed the most to the "Positive" and "Negative" classes.
Using the top 25 bi - grams per class (i.e., "Positive" / "Negative"), we invoked Bing news search with those bi - grams and retrieved up to 50 news items per bi - gram phrase.
We called it "noisy - search" because it is assumed that positive bi - grams (e.g., "profit rose", "growth net") give rise to positive examples, whereas negative bi - grams (e.g., "loss increase", "share loss") result in negative examples. Note that we didn't test for the validity of this assumption (hence: noisy - search).
For each article, we kept the title + excerpt and labeled it according to pre - assumptions on class associations.
We then trained the same model on the noisy data and applied it to a held - out test set from the original data set split.
Training with a couple of thousands of noisy "positives" and "negatives" examples yielded a test - set accuracy of about 95%.
It shows that by automatically collecting noisy examples using search, we can boost the accuracy performance from about 88% to more than 95%.

Accuracy results for Logistic Regression (LR) and BERT (base - cased) are shown in the attached pdf: https://drive.google.com/file/d/1MI9gRdppactVZ_XvhCwvoaOV1aRfprrd/view?usp = sharing

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 5

Framework versions

Transformers 4.16.2
Pytorch 1.10.0+cu111
Datasets 1.18.3
Tokenizers 0.11.0

🔧 Technical Details

The model fine - tunes "bert - base - uncased" on Kaggle's finance news sentiment analysis dataset. After the initial fine - tuning, a logistic - regression classifier is used to analyze bi - grams. Then, "noisy - search" is employed to collect additional data. The model is retrained on the noisy data, and the results show a significant improvement in accuracy.

📄 License

This model is licensed under the Apache - 2.0 license.

Property	Details
Model Type	bert - base - finance - sentiment - noisy - search
Training Data	Kaggle finance news sentiment analysis dataset https://www.kaggle.com/ankurzing/sentiment - analysis - for - financial - news and data collected through noisy - search

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご