phishing-email-detection-distilbert_v2.4.1開源模型 - 精準判斷郵件與URL釣魚風險

首頁

Phishing Email Detection Distilbert V2.4.1

由cybersectony開發

該模型基於DistilBERT架構，專為多標籤分類任務設計，用於判斷郵件和URL是否安全或存在釣魚風險。

文本分類

Transformers

英語開源協議:Apache-2.0 #釣魚郵件檢測 #多標籤分類 #高精度檢測

下載量 630

發布時間 : 10/27/2024

模型概述

這是一個專門用於檢測釣魚郵件和URL的安全模型，能夠識別合法和釣魚內容，幫助用戶防範網絡釣魚攻擊。

模型特點

高效輕量

基於DistilBERT架構，在保持高性能的同時減少了模型大小和計算資源需求

多標籤分類

能夠同時識別多種釣魚郵件和URL類型，包括合法郵件、釣魚鏈接、合法鏈接和替代釣魚鏈接

高準確率

在測試集上達到99.58%的準確率，F1分數、精確率和召回率均超過99.5%

模型能力

釣魚郵件檢測

惡意URL識別

多標籤文本分類

網絡安全分析

使用案例

企業安全

員工郵件安全篩查

自動掃描企業員工收到的郵件，識別潛在的釣魚攻擊

可顯著降低員工點擊釣魚郵件的風險

安全網關集成

集成到企業網絡安全網關中，即時檢測郵件和網頁中的釣魚內容

提升企業整體網絡安全防護水平

個人安全

個人郵件客戶端插件

為個人用戶提供郵件安全檢測插件

幫助個人用戶識別可疑郵件和鏈接

🚀 基於DistilBERT的釣魚郵件檢測模型

本模型基於DistilBERT構建，可對郵件和URL進行多標籤分類，判斷其是否安全或存在釣魚風險，為網絡安全提供有力保障。

✨ 主要特性

基於DistilBERT架構，具備高效的特徵提取能力。
經過微調，適用於郵件和URL的多標籤分類任務。
利用Hugging Face Trainer API進行微調，訓練過程更加便捷。

📦 安裝指南

pip install transformers
pip install torch

💻 使用示例

基礎用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("cybersectony/phishing-email-detection-distilbert_v2.4.1")
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("cybersectony/phishing-email-detection-distilbert_v2.4.1")

def predict_email(email_text):
    # Preprocess and tokenize
    inputs = tokenizer(
        email_text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    
    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
    # Get probabilities for each class
    probs = predictions[0].tolist()
    
    # Create labels dictionary
    labels = {
        "legitimate_email": probs[0],
        "phishing_url": probs[1],
        "legitimate_url": probs[2],
        "phishing_url_alt": probs[3]
    }
    
    # Determine the most likely classification
    max_label = max(labels.items(), key=lambda x: x[1])
    
    return {
        "prediction": max_label[0],
        "confidence": max_label[1],
        "all_probabilities": labels
    }

高級用法

# 示例用法
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""

result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
    print(f"{label}: {prob:.2%}")