Open-source finetuned-BERT phishing website classification model - Real-time prediction of website security status through text input

Finetuned Bert Phishing Site Classification

Developed by shogun-the-great

This model is a BERT-Base-Uncased fine-tuned phishing site classification model that can predict whether a website belongs to the 'safe' or 'unsafe' category based on text input.

Text Classification

Transformers

#Phishing detection #BERT fine-tuning #Website security

Downloads 21

Release Time : 1/15/2025

Model Overview

A binary classification model specifically designed to detect phishing website text content, distinguishing between safe and unsafe websites.

Model Features

BERT fine-tuning

Utilizes BERT's powerful language understanding capabilities for phishing site detection.

Binary classification capability

Accurately classifies website content into safe or unsafe categories.

English text support

Optimized specifically for English website content.

Model Capabilities

Text classification

Phishing detection

Website security assessment

Use Cases

Cybersecurity

Browser extension integration

Integrated into browser extensions for real-time website classification.

Helps users identify potential phishing sites.

Text data analysis

Analyzes phishing indicators in text data.

Identifies suspicious website content patterns.

🚀 Model Card for Fine - tuned BERT - Base - Uncased on Phishing Site Classification

This model is a fine - tuned BERT - Base - Uncased designed for phishing site classification. It predicts whether a website is "Safe" or "Not Safe" based on text input, offering valuable assistance in enhancing online security.

🚀 Quick Start

You can load the fine - tuned model directly from the Hugging Face Hub:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the tokenizer and model from Hugging Face Hub
model_name = "shogun - the - great/finetuned - bert - phishing - site - classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example usage
text = "Enter your login credentials to claim a free reward!"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)

# Get the predicted label
logits = outputs.logits
prediction = logits.argmax(dim=-1).item()
print("Prediction:", "Not Safe" if prediction == 1 else "Safe")

✨ Features

Model Details

This model is a fine - tuned version of [BERT - Base - Uncased](https://huggingface.co/google - bert/bert - base - uncased) for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input.

Property	Details
Developed by	[shogun - the - great](https://huggingface.co/shogun - the - great)
Model Type	Binary Classification (Safe vs Not Safe)
Language(s)	English
License	Apache - 2.0 (or specify your license)
Finetuned from model	`google/bert - base - uncased`

Model Sources

Dataset: [shawhin/phishing - site - classification](https://huggingface.co/datasets/shawhin/phishing - site - classification)

📚 Documentation

Uses

Direct Use

This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include:

Integrating with browser extensions for real - time website classification.
Analyzing textual data for phishing indicators.

Downstream Use

Users can fine - tune the model further for specific binary classification tasks or for datasets with similar domains.

Out - of - Scope Use

This model might not perform well for:

Non - English text.
Adversarial phishing attacks or heavily obfuscated text.
Tasks unrelated to text - based classification.

Bias, Risks, and Limitations

Bias

The model's predictions are influenced by the dataset used during fine - tuning. If the training data contains biases, these may reflect in the predictions.

Risks

False positives: Legitimate websites flagged as phishing.
False negatives: Some phishing sites might not be detected.
Potential vulnerabilities to adversarial examples.

Recommendations

⚠️ Important Note

Regularly update the dataset and model to stay aligned with emerging phishing patterns.

💡 Usage Tip

Use in combination with other security measures for robust phishing detection.

📄 License

This model is licensed under Apache - 2.0 (or specify your license).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご