spam-mail-classifierオープンソースメール分類モデル - スパムメールと非スパムメールの件名を的確に識別

ホーム

Spam Mail Classifier

Goodmotionによって開発

microsoft/Multilingual-MiniLM-L12-H384をファインチューニングしたテキスト分類モデルで、メール件名をスパムメール(SPAM)または非スパムメール(NOSPAM)に分類します。

テキスト分類

Transformers

オープンソースライセンス:Apache-2.0 #多言語メール分類 #軽量Transformer #スパムメール検出

ダウンロード数 943

リリース時間 : 12/9/2024

モデル概要

このモデルはメール件名のスパム検出に使用され、多言語テキスト分類タスクをサポートします。

モデル特徴

多言語サポート

Multilingual-MiniLMモデルをベースにし、複数言語のスパムメール検出をサポート

軽量モデル

MiniLMアーキテクチャを使用し、性能を維持しながら計算リソース要件を削減

シンプルで使いやすい

明確なAPIインターフェースを提供し、既存システムへの迅速な統合が可能

モデル能力

テキスト分類

スパムメール検出

多言語テキスト処理

使用事例

電子メール管理

スパムメールフィルタリング

スパムメールを自動識別してフィルタリング

メール処理効率の向上、スパムメールの干渉低減

メール分類システム

メールを自動的にスパムメールまたは通常メールに分類

メール管理プロセスの最適化

セキュリティ保護

フィッシングメール検出

潜在的なフィッシングメールや詐欺コンテンツを識別

電子メールセキュリティの強化

🚀 スパムメール分類器

このモデルは、microsoft/Multilingual-MiniLM-L12-H384 をファインチューニングして、メールの件名をスパム（SPAM）または非スパム（NOSPAM）に分類するためのものです。

🚀 クイックスタート

💻 使用例

基本的な使用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "Goodmotion/spam-mail-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name
)

text = "Félicitations ! Vous avez gagné un iPhone."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)

高度な使用法

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "Goodmotion/spam-mail-classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

texts = [
'Join us for a webinar on AI innovations',
'Urgent: Verify your account immediately.',
'Meeting rescheduled to 3 PM',
'Happy Birthday!',
'Limited time offer: Act now!',
'Join us for a webinar on AI innovations',
'Claim your free prize now!',
'You have unclaimed rewards waiting!',
'Weekly newsletter from Tech World',
'Update on the project status',
'Lunch tomorrow at 12:30?',
'Get rich quick with this amazing opportunity!',
'Invoice for your recent purchase',
'Don\'t forget: Gym session at 6 AM',
'Join us for a webinar on AI innovations',
'bonjour comment allez vous ?',
'Documents suite à notre rendez-vous',
'Valentin Dupond mentioned you in a comment',
'Bolt x Supabase = 🤯',
'Modification site web de la société',
'Image de mise en avant sur les articles',
'Bring new visitors to your site',
'Le Cloud Éthique sans bullshit',
'Remix Newsletter #25: React Router v7',
'Votre essai auprès de X va bientôt prendre fin',
'Introducing a Google Docs integration, styles and more in Claude.ai',
'Carte de crédit sur le point d’expirer sur Cloudflare'
]
inputs = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
outputs = model(**inputs)

# Convertir les logits en probabilités avec softmax
logits = outputs.logits
probabilities = torch.softmax(logits, dim=1)

# Décoder les classes pour chaque texte
labels = ["NOSPAM", "SPAM"]  # Mapping des indices à des labels
results = [
    {"text": text, "label": labels[torch.argmax(prob).item()], "confidence": prob.max().item()}
    for text, prob in zip(texts, probabilities)
]

# Afficher les résultats
for result in results:
    print(f"Texte : {result['text']}")
    print(f"Résultat : {result['label']} (Confiance : {result['confidence']:.2%})\n")

📚 ドキュメント

モデルの詳細

属性	详情
モデルタイプ	ベースモデル: `microsoft/Multilingual-MiniLM-L12-H384`
ファインチューニングの目的	テキスト分類
クラス数	2 (SPAM, NOSPAM)
言語	多言語