🚀 微調的BERT-base-uncased預訓練模型用於垃圾短信分類
本項目是一個經過微調的BERT-base-uncased預訓練模型,專門用於對垃圾短信進行分類。它能有效識別短信是否為垃圾信息,為短信處理提供了高效的解決方案。
🚀 快速開始
本項目是我在自然語言處理(NLP)領域的第二個項目,我對bert-base-uncased模型進行了微調,以實現對垃圾短信的分類。相較於這個項目有了巨大的改進。
查看評估結果日誌,請訪問:GitHub倉庫
如何使用這個模型
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained('fzn0x/bert-spam-classification-model')
model = BertForSequenceClassification.from_pretrained('fzn0x/bert-spam-classification-model')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
def model_predict(text: str):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
prediction = torch.argmax(logits, dim=1).item()
return 'SPAM' if prediction == 1 else 'HAM'
def predict():
text = "Hello, do you know with this crypto you can be rich? contact us in 88888"
predicted_label = model_predict(text)
print(f"1. Predicted class: {predicted_label}")
text = "Help me richard!"
predicted_label = model_predict(text)
print(f"2. Predicted class: {predicted_label}")
text = "You can buy loopstation for 100$, try buyloopstation.com"
predicted_label = model_predict(text)
print(f"3. Predicted class: {predicted_label}")
text = "Mate, I try to contact your phone, where are you?"
predicted_label = model_predict(text)
print(f"4. Predicted class: {predicted_label}")
if __name__ == "__main__":
predict()
📚 引用
如果您使用了本倉庫或其中的想法,請引用以下內容:
完整的BibTeX條目請見citations.bib
。
- Wolf等人,Transformers: State-of-the-Art Natural Language Processing,EMNLP 2020。ACL Anthology
- Pedregosa等人,Scikit-learn: Machine Learning in Python,JMLR 2011。
- Almeida & Gómez Hidalgo,SMS Spam Collection v.1,UCI Machine Learning Repository (2011)。Kaggle鏈接
🧠 致謝與使用的庫
📄 許可證
本項目採用MIT許可證。