đ Model Card for Fine - tuned BERT - Base - Uncased on Phishing Site Classification
This model is a fine - tuned BERT - Base - Uncased designed for phishing site classification. It predicts whether a website is "Safe" or "Not Safe" based on text input, offering valuable assistance in enhancing online security.
đ Quick Start
You can load the fine - tuned model directly from the Hugging Face Hub:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "shogun - the - great/finetuned - bert - phishing - site - classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Enter your login credentials to claim a free reward!"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
logits = outputs.logits
prediction = logits.argmax(dim=-1).item()
print("Prediction:", "Not Safe" if prediction == 1 else "Safe")
⨠Features
Model Details
This model is a fine - tuned version of [BERT - Base - Uncased](https://huggingface.co/google - bert/bert - base - uncased) for phishing site classification. The model predicts whether a website is classified as "Safe" or "Not Safe" based on textual input.
Property |
Details |
Developed by |
[shogun - the - great](https://huggingface.co/shogun - the - great) |
Model Type |
Binary Classification (Safe vs Not Safe) |
Language(s) |
English |
License |
Apache - 2.0 (or specify your license) |
Finetuned from model |
google/bert - base - uncased |
Model Sources
- Dataset: [shawhin/phishing - site - classification](https://huggingface.co/datasets/shawhin/phishing - site - classification)
đ Documentation
Uses
Direct Use
This model can be directly used for phishing detection by classifying text into two categories: "Safe" and "Not Safe." Typical use cases include:
- Integrating with browser extensions for real - time website classification.
- Analyzing textual data for phishing indicators.
Downstream Use
Users can fine - tune the model further for specific binary classification tasks or for datasets with similar domains.
Out - of - Scope Use
This model might not perform well for:
- Non - English text.
- Adversarial phishing attacks or heavily obfuscated text.
- Tasks unrelated to text - based classification.
Bias, Risks, and Limitations
Bias
The model's predictions are influenced by the dataset used during fine - tuning. If the training data contains biases, these may reflect in the predictions.
Risks
- False positives: Legitimate websites flagged as phishing.
- False negatives: Some phishing sites might not be detected.
- Potential vulnerabilities to adversarial examples.
Recommendations
â ī¸ Important Note
Regularly update the dataset and model to stay aligned with emerging phishing patterns.
đĄ Usage Tip
Use in combination with other security measures for robust phishing detection.
đ License
This model is licensed under Apache - 2.0 (or specify your license).