Final-complete-malicious-url-model Open Source Model - Free Deployment for Efficient Detection of Malicious URL Threats

Final Complete Malicious Url Model

Developed by r3ddkahili

A BERT-LoRA fine-tuned model for efficient detection of malicious URLs, including phishing, malware, and tampering threats.

Text Classification

Transformers

EnglishOpen Source License:Apache-2.0 #BERT-LoRA Fine-tuning #Real-time URL Detection #High-precision Classification

Downloads 434

Release Time : 1/21/2025

Model Overview

This model employs Low-Rank Adaptation (LoRA) technology to fine-tune BERT, enabling real-time classification of URLs as benign, tampered, phishing, or malware with an accuracy of 98%.

Model Features

Efficient Fine-tuning

Utilizes LoRA (Low-Rank Adaptation) technology to reduce computational costs while maintaining high accuracy.

High Accuracy

Achieves a validation accuracy of 98% and an F1 score of 0.965, ensuring robust detection capabilities.

Multi-category Detection

Capable of classifying four threat types: benign, tampered, phishing, and malware.

Model Capabilities

Malicious URL Detection

Phishing URL Identification

Malware URL Identification

Tampered URL Identification

Use Cases

Cybersecurity

Real-time URL Classification

Integrated into cybersecurity tools to detect and classify accessed URLs in real-time.

98% accuracy

Browser Extension

Planned development of a browser extension to provide instant threat alerts.

Security Monitoring

SOC Integration

Used in Security Operations Centers (SOC) for security monitoring and threat analysis.

🚀 Malicious URL Detection Model

A fine-tuned BERT-LoRA model for detecting malicious URLs, including phishing, malware, and defacement threats.

🚀 Quick Start

This is a fine - tuned BERT - based classifier aimed at real - time detection of malicious URLs. It uses Low - Rank Adaptation (LoRA) for efficient fine - tuning, reducing computational costs while maintaining high accuracy.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "your-huggingface-model-name"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example URL
url = "http://example.com/login"

# Tokenize and predict
inputs = tokenizer(url, return_tensors="pt", truncation=True, padding=True, max_length=128)
with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits).item()

# Mapping prediction to labels
label_map = {0: "Benign", 1: "Defacement", 2: "Phishing", 3: "Malware"}
print(f"Prediction: {label_map[prediction]}")

✨ Features

Classifies URLs into four categories: Benign, Defacement, Phishing, and Malware.
Achieves 98% validation accuracy and an F1 - score of 0.965, ensuring robust detection capabilities.

📦 Installation

The README does not provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the model and tokenizer
model_name = "your-huggingface-model-name"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example URL
url = "http://example.com/login"

# Tokenize and predict
inputs = tokenizer(url, return_tensors="pt", truncation=True, padding=True, max_length=128)
with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits).item()

# Mapping prediction to labels
label_map = {0: "Benign", 1: "Defacement", 2: "Phishing", 3: "Malware"}
print(f"Prediction: {label_map[prediction]}")

Advanced Usage

The README does not provide advanced usage examples, so this part is not added.

📚 Documentation

Intended Uses

Use Cases

Real - time URL classification for cybersecurity tools.
Phishing and malware detection for online safety.
Integration into browser extensions for instant threat alerts.
Security monitoring for SOC (Security Operations Centers).

Model Details

Property	Details
Model Type	BERT - based URL Classifier
Fine - Tuning Method	LoRA (Low - Rank Adaptation)
Base Model	`bert - base - uncased`
Number of Parameters	110M
Dataset	Kaggle Malicious URLs Dataset (~651,191 samples)
Max Sequence Length	`128`
Framework	🤗 `transformers`, `torch`, `peft`

Training Details

Batch Size: 16
Epochs: 5
Learning Rate: 2e - 5
Optimizer: AdamW with weight decay
Loss Function: Weighted Cross - Entropy
Evaluation Strategy: Epoch - based
Fine - Tuning Strategy: LoRA applied to BERT layers

Evaluation Results

Metric	Value
Accuracy	98%
Precision	0.96
Recall	0.97
F1 Score	0.965

Category - wise Performance

Category	Precision	Recall	F1 - Score
Benign	0.98	0.99	0.985
Defacement	0.98	0.99	0.985
Phishing	0.93	0.94	0.935
Malware	0.95	0.96	0.955

Deployment Options

Streamlit Web App

Deployed on Streamlit Cloud, AWS, or Google Cloud.
Provides real - time URL analysis with a user - friendly interface.

Browser Extension (Planned)

Real - time scanning of visited web pages.
Dynamic threat alerts with confidence scores.

API Integration

REST API for bulk URL analysis.
Supports Security Operations Centers (SOC).

Limitations & Bias

⚠️ Important Note

The model may misclassify complex phishing URLs that mimic legitimate sites.

It needs regular updates to counter evolving threats.

There is potential bias if future threats are not represented in training data.

Training Data & Citation

Data Source

Dataset sourced from Kaggle Malicious URLs Dataset:
📌 Dataset Link

BibTeX Citation

@article{maliciousurl2025,
  author    = {Gleyzie Tongo, Dr. Farnaz Farid, Dr. Ala Al-Areqi, Dr. Farhad Ahamed},
  title     = {Fine-Tuned BERT for Malicious URL Detection},
  year      = {2025},
  institution = {Western Sydney University}
}

Contact

For inquiries, collaborations, or feedback, feel free to reach out via LinkedIn:
🔗 Gleyzie Tongo

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご