PhishMail Open-Source Phishing Email Detection Model - Accurately Distinguish Phishing and Normal Emails

Phishmail

Developed by jagan-raj

A BERT-based fine-tuned phishing email detection model that accurately identifies phishing emails versus legitimate emails.

Text Classification

Transformers

English#Phishing email detection #BERT fine-tuning #Email security

Downloads 79

Release Time : 1/11/2025

Model Overview

This model analyzes email content and leverages BERT's contextual understanding to classify emails as phishing or legitimate, enhancing email security.

Model Features

Contextual understanding

Utilizes BERT's bidirectional Transformer architecture to understand contextual relationships in email content and identify hidden clues in phishing emails.

High accuracy

Fine-tuned on phishing email datasets with training loss as low as 0.07, demonstrating excellent performance.

Ease of use

Provides simple API interfaces that can be integrated into existing systems with just a few lines of code.

Model Capabilities

Text classification

Phishing email detection

Natural language understanding

Use Cases

Email security

Enterprise email filtering

Integrated into corporate email systems to automatically filter potential phishing emails.

Reduces the risk of employees clicking on phishing emails.

Personal email protection

Used in personal email client plugins to flag suspicious emails.

Enhances personal cybersecurity protection.

🚀 PhishMail - BERT Model for Phishing Detection

This repository presents a fine - tuned BERT model tailored for detecting phishing emails. The model analyzes the body text of emails to classify them as either phishing or legitimate, thus enhancing email security.

🚀 Quick Start

Prerequisites

You need to install the necessary libraries using the following command:

!pip install transformers torch

Loading the Model

from transformers import BertForSequenceClassification, BertTokenizer
import torch

# Specify the Hugging Face model repository name
model_name = 'jagan-raj/PhishMail'

# Load the fine-tuned BERT model for phishing detection
model = BertForSequenceClassification.from_pretrained(model_name)

# Load the corresponding tokenizer for the fine-tuned model
tokenizer = BertTokenizer.from_pretrained(model_name)

# Set the model to evaluation mode for inference
model.eval()

Making Predictions

# Input the email text for classification
email_text = "Your email content here"

# Tokenize and preprocess the input text
# Converts the email text into token IDs, applies truncation/padding, and creates a tensor
inputs = tokenizer(
    email_text, 
    return_tensors="pt",        # Output tensors in PyTorch format
    truncation=True,            # Truncate the text if it exceeds the max_length
    padding='max_length'        # Pad the text to the maximum sequence length
)

# Make a prediction using the model
with torch.no_grad():           # Disable gradient calculations for faster inference
    outputs = model(**inputs)   # Get model outputs
    logits = outputs.logits     # Extract raw prediction scores (logits)
    predictions = torch.argmax(logits, dim=-1)  # Determine the predicted class (0 or 1)

# Interpret the prediction result
# Map the prediction to its corresponding label: 1 for "Phishing", 0 for "Legitimate"
result = "This is a phishing email." if predictions.item() == 1 else "This is a legitimate email."

# Print the prediction result
print(f"Prediction: {result}")

✨ Features

Accurate Detection: The model can accurately classify emails as phishing or legitimate by analyzing the email body text.
Contextual Understanding: It leverages the power of BERT to understand the context of the email text, enabling it to detect subtle phishing cues.

📦 Installation

Use the following command to install the required dependencies:

!pip install transformers torch

💻 Usage Examples

Basic Usage

from transformers import BertForSequenceClassification, BertTokenizer
import torch

model_name = 'jagan-raj/PhishMail'
model = BertForSequenceClassification.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)
model.eval()

email_text = "Your email content here"
inputs = tokenizer(email_text, return_tensors="pt", truncation=True, padding='max_length')
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)
result = "This is a phishing email." if predictions.item() == 1 else "This is a legitimate email."
print(f"Prediction: {result}")

📚 Documentation

Model Details

Property	Details
Model Type	BERT (Bidirectional Encoder Representations from Transformers)
Task	Phishing detection (Binary classification: phishing vs. legitimate)
Fine - Tuning	The model was fine - tuned on a carefully curated dataset of phishing and legitimate emails, ensuring diversity in email content and structure.
Objective	To enhance email security by accurately identifying phishing attempts using contextual understanding of email body text.
Developed by	Jagan Raj
Model type	google-bert/bert-base-uncased
License	Free for all
Dataset	zefang-liu/phishing-email-dataset

Evaluation

TrainOutput(global_step=6297, training_loss=0.07093968526965307, metrics={'train_runtime': 5545.442, 'train_samples_per_second': 9.08, 'train_steps_per_second': 1.136, 'total_flos': 1.32489571926528e+16, 'train_loss': 0.07093968526965307, 'epoch': 3.0})

📄 License

The model is free for all to use.

👨‍💻 Author

Jagan Raj
LinkedIn Profile

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご