🚀 LogClassifier
LogClassifier is a transformers classification model developed by Selector AI. It is designed for network and device log mining tasks, capable of classifying logs into predefined categories to facilitate network issue diagnosis.
🚀 Quick Start
Prerequisites
Ensure you have installed the necessary libraries, including transformers
and torch
.
Installation
pip install transformers torch
Usage
from transformers import BertForSequenceClassification, BertTokenizer
model = BertForSequenceClassification.from_pretrained("rahulm-selector/log-classifier-BERT-v1")
tokenizer = BertTokenizer.from_pretrained("rahulm-selector/log-classifier-BERT-v1")
import torch
model.eval()
log_text = "Error occurred while accessing the database."
inputs = tokenizer(log_text, return_tensors="pt", padding=True, truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
label_mapping = model.config.id2label
predicted_event = label_mapping[predicted_class]
print(f"Predicted Event: {predicted_event}")
✨ Features
- Focus on Structured and Semi - structured Logs: The model focuses on structured and semi - structured log data, outputting around 60 different event categories.
- Real - time Analysis: It is highly effective for real - time log analysis, anomaly detection, and operational monitoring.
- Automated Classification: Helps organizations manage large - scale network data by automatically classifying logs into predefined categories, facilitating faster and more accurate diagnosis of network issues.
📚 Documentation
Background
The model focuses on structured and semi - structured log data, outputting around 60 different event categories. It is highly effective for real - time log analysis, anomaly detection, and operational monitoring, helping organizations manage large - scale network data by automatically classifying logs into predefined categories, facilitating faster and more accurate diagnosis of network issues.
Intended Uses
Our model is intended to be used as a classifier. Given an input text (a log coming from a network/device/router), it outputs a corresponding event most associated with the log. The possible events that can be classified are shown in [encoder - main.json](https://huggingface.co/rahulm - selector/log - classifier - BERT - v1/blob/main/encoder - main.json)
Training Details
Data
The model was trained on a variety of network events and system logs, focusing on monitoring and analyzing state changes, protocol behaviors, and hardware interactions across infrastructure components. This included tracking routing issues, protocol neighbor state changes, link stability, and security events, ensuring that the model could recognize and classify critical patterns in device communications, network health, and configuration activities.
Train/Test Split
Property |
Details |
Train Data Size |
~80K Logs |
Test Data Size |
~20K Logs |
Hyper Parameters
The following hyperparameters were used during training to optimize the model's performance:
Property |
Details |
Batch Size |
32 |
Learning Rate |
.001 |
Optimizer |
Adam |
Epochs |
10 |
Dropout Rate |
N/A |
LSTM Hidden Dimension |
384 |
Embedding Dimension |
384 |