bert-spam-classification-model開源模型 - 精準區分英文垃圾短信與正常短信

首頁

Bert Spam Classification Model

由fzn0x開發

這是一個通過微調bert-base-uncased模型實現的英文垃圾短信分類模型，能夠準確區分垃圾短信和正常短信。

文本分類

Safetensors

英語開源協議:MIT #英文短信分類 #BERT微調 #垃圾郵件過濾

下載量 209

發布時間 : 4/9/2025

模型概述

該模型基於BERT架構，專門用於英文短信的垃圾信息分類任務，可有效識別營銷、詐騙等垃圾短信。

模型特點

高準確率分類

基於BERT強大的語義理解能力，能準確區分垃圾短信與正常短信

簡單易用

提供開箱即用的預測接口，只需幾行代碼即可集成到應用中

輕量級模型

基於BERT-base而非更大的模型變體，在保持性能的同時減少資源消耗

模型能力

英文文本分類

垃圾短信檢測

自然語言理解

使用案例

通信安全

短信過濾系統

集成到手機短信應用中自動過濾垃圾短信

減少用戶收到的垃圾短信數量

客服系統防護

識別並攔截髮送給客服系統的垃圾信息

提高客服工作效率

數據分析

垃圾短信分析

批量分析短信數據庫中的垃圾信息比例

幫助瞭解垃圾短信趨勢

🚀 微調的BERT-base-uncased預訓練模型用於垃圾短信分類

本項目是一個經過微調的BERT-base-uncased預訓練模型，專門用於對垃圾短信進行分類。它能有效識別短信是否為垃圾信息，為短信處理提供了高效的解決方案。

🚀 快速開始

本項目是我在自然語言處理（NLP）領域的第二個項目，我對bert-base-uncased模型進行了微調，以實現對垃圾短信的分類。相較於這個項目有了巨大的改進。

查看評估結果日誌，請訪問：GitHub倉庫

如何使用這個模型

from transformers import BertTokenizer, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained('fzn0x/bert-spam-classification-model')
model = BertForSequenceClassification.from_pretrained('fzn0x/bert-spam-classification-model')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def model_predict(text: str):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
    with torch.no_grad():
        outputs = model(**inputs)
    logits = outputs.logits
    prediction = torch.argmax(logits, dim=1).item()
    return 'SPAM' if prediction == 1 else 'HAM'

def predict():
    text = "Hello, do you know with this crypto you can be rich? contact us in 88888"
    predicted_label = model_predict(text)
    print(f"1. Predicted class: {predicted_label}") # EXPECT: SPAM

    text = "Help me richard!"
    predicted_label = model_predict(text)
    print(f"2. Predicted class: {predicted_label}") # EXPECT: HAM

    text = "You can buy loopstation for 100$, try buyloopstation.com"
    predicted_label = model_predict(text)
    print(f"3. Predicted class: {predicted_label}") # EXPECT: SPAM

    text = "Mate, I try to contact your phone, where are you?"
    predicted_label = model_predict(text)
    print(f"4. Predicted class: {predicted_label}") # EXPECT: HAM

if __name__ == "__main__":
    predict()

📚 引用

如果您使用了本倉庫或其中的想法，請引用以下內容：

完整的BibTeX條目請見citations.bib。

Wolf等人，Transformers: State-of-the-Art Natural Language Processing，EMNLP 2020。ACL Anthology
Pedregosa等人，Scikit-learn: Machine Learning in Python，JMLR 2011。
Almeida & Gómez Hidalgo，SMS Spam Collection v.1，UCI Machine Learning Repository (2011)。Kaggle鏈接