🚀 基于DistilBERT的网络钓鱼邮件检测模型
本模型基于DistilBERT构建,经过微调后可对邮件和URL进行多标签分类,判断其是否安全或存在潜在的网络钓鱼风险。
🚀 快速开始
本模型基于DistilBERT,经过微调可用于对邮件和URL进行多标签分类,判断其是否安全或存在潜在的网络钓鱼风险。
安装
pip install transformers
pip install torch
快速上手
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
def predict_email(email_text):
inputs = tokenizer(
email_text,
return_tensors="pt",
truncation=True,
max_length=512
)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
probs = predictions[0].tolist()
labels = {
"legitimate_email": probs[0],
"phishing_url": probs[1],
"legitimate_url": probs[2],
"phishing_url_alt": probs[3]
}
max_label = max(labels.items(), key=lambda x: x[1])
return {
"prediction": max_label[0],
"confidence": max_label[1],
"all_probabilities": labels
}
示例用法
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""
result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
print(f"{label}: {prob:.2%}")
✨ 主要特性
- 基于DistilBERT架构,进行了多标签分类的微调。
- 能够对邮件和URL进行分类,判断其是否安全或存在潜在的网络钓鱼风险。
📦 安装指南
pip install transformers
pip install torch
💻 使用示例
基础用法
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
def predict_email(email_text):
inputs = tokenizer(
email_text,
return_tensors="pt",
truncation=True,
max_length=512
)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
probs = predictions[0].tolist()
labels = {
"legitimate_email": probs[0],
"phishing_url": probs[1],
"legitimate_url": probs[2],
"phishing_url_alt": probs[3]
}
max_label = max(labels.items(), key=lambda x: x[1])
return {
"prediction": max_label[0],
"confidence": max_label[1],
"all_probabilities": labels
}
高级用法
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""
result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
print(f"{label}: {prob:.2%}")
📚 详细文档
模型概述
该模型基于DistilBERT,经过微调用于对邮件和URL进行多标签分类,判断其是否安全或存在潜在的网络钓鱼风险。
关键规格
属性 |
详情 |
基础架构 |
DistilBERT |
任务 |
多标签分类 |
微调框架 |
Hugging Face Trainer API |
训练轮数 |
3 epochs |
性能指标
指标 |
数值 |
F1分数 |
97.717 |
准确率 |
97.716 |
精确率 |
97.736 |
召回率 |
97.717 |
数据集详情
该模型在一个自定义的邮件和URL数据集上进行训练,这些数据被标记为合法或钓鱼。该数据集可在Hugging Face Hub上的cybersectony/PhishingEmailDetection
获取。
📄 许可证
本项目采用Apache-2.0许可证。