BERT_Regulatory_Text_Classification_01 Open-Source Model - Freely Achieve Multi-Label Classification of Financial Regulatory Texts

BERT Regulatory Text Classification 01

Developed by yirifiai1

BERT-based multi-label classification model for financial regulatory domains, optimized for anti-money laundering/counter-terrorism financing regulatory texts

Text Classification

Transformers

English#Financial regulatory texts #Multi-label classification #BERT fine-tuning

Downloads 28

Release Time : 6/6/2024

Model Overview

This model is a financial regulatory-specific model fine-tuned from ProsusAI/finbert, excelling at multi-label classification tasks, particularly suitable for financial institution compliance text analysis

Model Features

Financial domain specialization

Fine-tuned from the financial domain pre-trained model finbert for better understanding of regulatory terminology

Multi-label classification capability

Can simultaneously identify multiple related regulatory categories in text

High-precision risk identification

Achieves F1 score of 0.8637 on financial regulatory texts, accurately identifying risk-related expressions

Model Capabilities

Financial text classification

Regulatory compliance analysis

Risk factor identification

Multi-label prediction

Use Cases

Financial compliance

Anti-money laundering text screening

Automatically identifies money laundering risk signals in transaction reports

High-accuracy identification with F1 score of 0.8637

Regulatory policy classification

Performs multi-dimensional classification labeling on financial regulatory policy documents

Risk management

Risk factor extraction

Extracts various risk-related expressions from financial institution reports

🚀 Transformers - BERT for Financial Regulatory Multi-label Classification

This model is a fine-tuned version of the BERT language model, specifically designed for multi-label classification tasks in the financial regulatory field. Built on the pre-trained ProsusAI/finbert model, it has been further fine-tuned using a diverse dataset of financial regulatory texts. This enables the model to accurately classify text into multiple relevant categories simultaneously.

🚀 Quick Start

This fine-tuned model can be quickly integrated into your text classification tasks in the financial regulatory domain. You can use the transformers library to load and use it.

✨ Features

Multi-label Classification: Capable of classifying text into multiple relevant categories at once.
Financial Regulatory Focus: Specifically adapted for the financial regulatory domain, leveraging a dataset of financial regulatory texts.
Fine-tuned BERT: Built on the pre-trained ProsusAI/finbert model, with further fine-tuning for better performance.

📦 Installation

Since this model is based on the transformers library, you can install it using the following command:

pip install transformers

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('your_model_path')
model = AutoModelForSequenceClassification.from_pretrained('your_model_path')

# Input text
text = "Where an FI employs a technological solution provided by an external party to conduct screening of virtual asset transactions and the associated wallet addresses, the FI remains responsible for discharging its AML/CFT obligations. The FI should conduct due diligence on the solution before deploying it, taking into account relevant factors such as:"

# Tokenize the input text
inputs = tokenizer(text, return_tensors='pt')

# Get the model output
with torch.no_grad():
    outputs = model(**inputs)

# Get the predicted labels
logits = outputs.logits
predicted_labels = torch.sigmoid(logits) > 0.5
print(predicted_labels)

Advanced Usage

# Advanced scenario: Using the model in a batch processing mode
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('your_model_path')
model = AutoModelForSequenceClassification.from_pretrained('your_model_path')

# List of input texts
texts = ["Text 1", "Text 2", "Text 3"]

# Tokenize the input texts
inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)

# Get the model output
with torch.no_grad():
    outputs = model(**inputs)

# Get the predicted labels
logits = outputs.logits
predicted_labels = torch.sigmoid(logits) > 0.5
print(predicted_labels)

📚 Documentation

Model Architecture

Property	Details
Model Type	BERT-based multi-label classification model
Pre-trained Model	ProsusAI/finbert
Task	Multi-label classification

Performance

Performance metrics on the validation set:

F1 Score: 0.8637
ROC AUC: 0.9044
Accuracy: 0.6155

Limitations and Ethical Considerations

The model's performance may vary depending on the specific nature of the text data and label distribution.
There is class imbalance in the dataset.

Dataset Information

Property	Details
Training Dataset	Number of samples: 6562
Validation Dataset	Number of samples: 929
Test Dataset	Number of samples: 1884

Training Details

Property	Details
Training Strategy	Fine-tuning BERT with a randomly initialized classification head
Optimizer	Adam
Learning Rate	1e-4
Batch Size	16
Number of Epochs	2
Evaluation Strategy	Epoch
Weight Decay	0.01
Metric for Best Model	F1 Score

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご