đ PhishMail - BERT Model for Phishing Detection
This repository presents a fine - tuned BERT model tailored for detecting phishing emails. The model analyzes the body text of emails to classify them as either phishing or legitimate, thus enhancing email security.
đ Quick Start
Prerequisites
You need to install the necessary libraries using the following command:
!pip install transformers torch
Loading the Model
from transformers import BertForSequenceClassification, BertTokenizer
import torch
model_name = 'jagan-raj/PhishMail'
model = BertForSequenceClassification.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)
model.eval()
Making Predictions
email_text = "Your email content here"
inputs = tokenizer(
email_text,
return_tensors="pt",
truncation=True,
padding='max_length'
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predictions = torch.argmax(logits, dim=-1)
result = "This is a phishing email." if predictions.item() == 1 else "This is a legitimate email."
print(f"Prediction: {result}")
⨠Features
- Accurate Detection: The model can accurately classify emails as phishing or legitimate by analyzing the email body text.
- Contextual Understanding: It leverages the power of BERT to understand the context of the email text, enabling it to detect subtle phishing cues.
đĻ Installation
Use the following command to install the required dependencies:
!pip install transformers torch
đģ Usage Examples
Basic Usage
from transformers import BertForSequenceClassification, BertTokenizer
import torch
model_name = 'jagan-raj/PhishMail'
model = BertForSequenceClassification.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)
model.eval()
email_text = "Your email content here"
inputs = tokenizer(email_text, return_tensors="pt", truncation=True, padding='max_length')
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predictions = torch.argmax(logits, dim=-1)
result = "This is a phishing email." if predictions.item() == 1 else "This is a legitimate email."
print(f"Prediction: {result}")
đ Documentation
Model Details
Property |
Details |
Model Type |
BERT (Bidirectional Encoder Representations from Transformers) |
Task |
Phishing detection (Binary classification: phishing vs. legitimate) |
Fine - Tuning |
The model was fine - tuned on a carefully curated dataset of phishing and legitimate emails, ensuring diversity in email content and structure. |
Objective |
To enhance email security by accurately identifying phishing attempts using contextual understanding of email body text. |
Developed by |
Jagan Raj |
Model type |
google-bert/bert-base-uncased |
License |
Free for all |
Dataset |
zefang-liu/phishing-email-dataset |
Evaluation
TrainOutput(global_step=6297, training_loss=0.07093968526965307, metrics={'train_runtime': 5545.442, 'train_samples_per_second': 9.08, 'train_steps_per_second': 1.136, 'total_flos': 1.32489571926528e+16, 'train_loss': 0.07093968526965307, 'epoch': 3.0})
đ License
The model is free for all to use.
đ¨âđģ Author