phishing-email-detection-distilbert_v2.4.1开源模型 - 精准判断邮件与URL钓鱼风险

首页

Phishing Email Detection Distilbert V2.4.1

由 cybersectony 开发

该模型基于DistilBERT架构，专为多标签分类任务设计，用于判断邮件和URL是否安全或存在钓鱼风险。

文本分类

Transformers

英语开源协议:Apache-2.0 #钓鱼邮件检测 #多标签分类 #高精度检测

下载量 630

发布时间 : 10/27/2024

模型简介

这是一个专门用于检测钓鱼邮件和URL的安全模型，能够识别合法和钓鱼内容，帮助用户防范网络钓鱼攻击。

模型特点

高效轻量

基于DistilBERT架构，在保持高性能的同时减少了模型大小和计算资源需求

多标签分类

能够同时识别多种钓鱼邮件和URL类型，包括合法邮件、钓鱼链接、合法链接和替代钓鱼链接

高准确率

在测试集上达到99.58%的准确率，F1分数、精确率和召回率均超过99.5%

模型能力

钓鱼邮件检测

恶意URL识别

多标签文本分类

网络安全分析

使用案例

企业安全

员工邮件安全筛查

自动扫描企业员工收到的邮件，识别潜在的钓鱼攻击

可显著降低员工点击钓鱼邮件的风险

安全网关集成

集成到企业网络安全网关中，实时检测邮件和网页中的钓鱼内容

提升企业整体网络安全防护水平

个人安全

个人邮件客户端插件

为个人用户提供邮件安全检测插件

帮助个人用户识别可疑邮件和链接

🚀 基于DistilBERT的钓鱼邮件检测模型

本模型基于DistilBERT构建，可对邮件和URL进行多标签分类，判断其是否安全或存在钓鱼风险，为网络安全提供有力保障。

✨ 主要特性

基于DistilBERT架构，具备高效的特征提取能力。
经过微调，适用于邮件和URL的多标签分类任务。
利用Hugging Face Trainer API进行微调，训练过程更加便捷。

📦 安装指南

pip install transformers
pip install torch

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("cybersectony/phishing-email-detection-distilbert_v2.4.1")
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("cybersectony/phishing-email-detection-distilbert_v2.4.1")

def predict_email(email_text):
    # Preprocess and tokenize
    inputs = tokenizer(
        email_text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    
    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
    # Get probabilities for each class
    probs = predictions[0].tolist()
    
    # Create labels dictionary
    labels = {
        "legitimate_email": probs[0],
        "phishing_url": probs[1],
        "legitimate_url": probs[2],
        "phishing_url_alt": probs[3]
    }
    
    # Determine the most likely classification
    max_label = max(labels.items(), key=lambda x: x[1])
    
    return {
        "prediction": max_label[0],
        "confidence": max_label[1],
        "all_probabilities": labels
    }

高级用法

# 示例用法
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""

result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
    print(f"{label}: {prob:.2%}")