🚀 Transformers - BERT for Financial Regulatory Multi-label Classification
This model is a fine-tuned version of the BERT language model, specifically designed for multi-label classification tasks in the financial regulatory field. Built on the pre-trained ProsusAI/finbert model, it has been further fine-tuned using a diverse dataset of financial regulatory texts. This enables the model to accurately classify text into multiple relevant categories simultaneously.
🚀 Quick Start
This fine-tuned model can be quickly integrated into your text classification tasks in the financial regulatory domain. You can use the transformers
library to load and use it.
✨ Features
- Multi-label Classification: Capable of classifying text into multiple relevant categories at once.
- Financial Regulatory Focus: Specifically adapted for the financial regulatory domain, leveraging a dataset of financial regulatory texts.
- Fine-tuned BERT: Built on the pre-trained ProsusAI/finbert model, with further fine-tuning for better performance.
📦 Installation
Since this model is based on the transformers
library, you can install it using the following command:
pip install transformers
💻 Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained('your_model_path')
model = AutoModelForSequenceClassification.from_pretrained('your_model_path')
text = "Where an FI employs a technological solution provided by an external party to conduct screening of virtual asset transactions and the associated wallet addresses, the FI remains responsible for discharging its AML/CFT obligations. The FI should conduct due diligence on the solution before deploying it, taking into account relevant factors such as:"
inputs = tokenizer(text, return_tensors='pt')
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_labels = torch.sigmoid(logits) > 0.5
print(predicted_labels)
Advanced Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained('your_model_path')
model = AutoModelForSequenceClassification.from_pretrained('your_model_path')
texts = ["Text 1", "Text 2", "Text 3"]
inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_labels = torch.sigmoid(logits) > 0.5
print(predicted_labels)
📚 Documentation
Model Architecture
Property |
Details |
Model Type |
BERT-based multi-label classification model |
Pre-trained Model |
ProsusAI/finbert |
Task |
Multi-label classification |
Performance
Performance metrics on the validation set:
- F1 Score: 0.8637
- ROC AUC: 0.9044
- Accuracy: 0.6155
Limitations and Ethical Considerations
- The model's performance may vary depending on the specific nature of the text data and label distribution.
- There is class imbalance in the dataset.
Dataset Information
Property |
Details |
Training Dataset |
Number of samples: 6562 |
Validation Dataset |
Number of samples: 929 |
Test Dataset |
Number of samples: 1884 |
Training Details
Property |
Details |
Training Strategy |
Fine-tuning BERT with a randomly initialized classification head |
Optimizer |
Adam |
Learning Rate |
1e-4 |
Batch Size |
16 |
Number of Epochs |
2 |
Evaluation Strategy |
Epoch |
Weight Decay |
0.01 |
Metric for Best Model |
F1 Score |