MiniLM-L12-H384-uncasedオープンソーステキスト分類器 - AIコンテンツセキュリティ検出に無料でデプロイ可能

ホーム

Minilm L12 H384 Uncased Nvidia Aegis AI Safety

ACによって開発

microsoft/MiniLM-L12-H384-uncasedをファインチューニングしたマルチラベルテキスト分類器で、AIコンテンツセキュリティ検出に使用

テキスト分類

Transformers

#AIコンテンツセキュリティ検出 #マルチラベルテキスト分類 #有害コンテンツ識別

ダウンロード数 14.56k

リリース時間 : 6/23/2024

モデル概要

このモデルはMiniLMアーキテクチャに基づくテキスト分類器で、暴力、ヘイトスピーチ、プライバシー漏洩など14種類の安全でないコンテンツを検出するために特別に設計されています。コンテンツ審査やAIセキュリティなどのシナリオに適しています。

モデル特徴

マルチラベル分類

テキスト内の複数の安全でないコンテンツタイプを同時に検出可能

高精度

テストセットで95.15%の精度を達成

低見逃し率

AIセキュリティシナリオ向けに最適化され、リコール指標を重点的に考慮

モデル能力

テキストコンテンツセキュリティ検出

マルチラベル分類

有害コンテンツ識別

使用事例

コンテンツ審査

ソーシャルメディアコンテンツフィルタリング

ユーザー生成コンテンツ内の有害情報を自動検出

14種類の安全でないコンテンツを識別可能

AIチャットセキュリティ

AI生成応答内の安全でないコンテンツを検出

有害コンテンツ出力リスクを低減

🚀 AC/MiniLM-L12-H384-uncased_Nvidia-Aegis-AI-Safety

このモデルは、microsoft/MiniLM-L12-H384-uncasedをnvidia/Aegis-AI-Content-Safety-Dataset-1.0データセットでファインチューニングしたものです。トレーニングセットには合計3099のサンプルが含まれています。これは、14のカテゴリを持つマルチラベルテキスト分類器です。

🚀 クイックスタート

モデルの読み込みと予測

from accelerate import Accelerator
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import numpy as np
import torch

accelerator = Accelerator()
device = accelerator.device

def load_model(model_path, accelerator_device=None):
    model = AutoModelForSequenceClassification.from_pretrained(
        model_path, 
        problem_type="multi_label_classification", 
    )

    if accelerator_device:
        model.to(accelerator_device)
    
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    return model, tokenizer

def predict(model, tokenizer, text, accelerator_device=None, threshold=0.5):
    if accelerator_device:
        inputs = tokenizer([text], return_tensors="pt").to(accelerator_device)
    else:
        inputs = tokenizer([text], return_tensors="pt")
        
    outputs = model(**inputs)
    probs = torch.nn.Sigmoid()((outputs.logits.squeeze().cpu()))
    predictions = np.zeros(probs.shape)
    predictions[np.where(probs >= threshold)] = 1
    return [model.config.id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]

# USING CPU
hf_model, tokenizer = load_model("AC/MiniLM-L12-H384-uncased_Nvidia-Aegis-AI-Safety")
predict(hf_model, tokenizer, "How to make a bomb?")

# USING GPU
hf_model, tokenizer = load_model("AC/MiniLM-L12-H384-uncased_Nvidia-Aegis-AI-Safety", device)
predict(hf_model, tokenizer, "How to make a bomb?", device)

✨ 主な機能

このモデルは、以下の14のカテゴリを持つマルチラベルテキスト分類器です。

"0": "Controlled/Regulated Substances"
"1": "Criminal Planning/Confessions"
"2": "Deception/Fraud"
"3": "Guns and Illegal Weapons"
"4": "Harassment"
"5": "Hate/Identity Hate"
"6": "Needs Caution"
"7": "PII/Privacy"
"8": "Profanity"
"9": "Sexual"
"10": "Sexual (minor)"
"11": "Suicide and Self Harm"
"12": "Threat"
"13": "Violence"

📚 ドキュメント

評価

評価は、nvidia/Aegis-AI-Content-Safety-Dataset-1.0データセットのテストセットで行われます。テストセットには合計359のサンプルが含まれています。

AIセキュリティのユースケースでは、誤陰性（実際に有害なテキストを無害と誤判定する）は誤陽性（実際に無害なテキストを有害と誤判定する）よりも問題となります。

精度（Precision）: 有害と予測されたテキストのうち、実際に有害なものの割合
再現率（Recall）: 実際に有害なテキストのうち、有害と予測されたものの割合

誤陰性を減らすために、再現率に焦点を当てます。

指標	値
accuracy	0.9514524472741743
f1	0.5325670498084292
precision	0.668269230769
recall	0.442675159235668
TP	4643
TN	139
FP	69
FN	175

ファインチューニング

from accelerate import Accelerator
from datasets import load_dataset, Dataset, DatasetDict
from datetime import datetime
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer, EvalPrediction, DataCollatorWithPadding
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score, coverage_error

import numpy as np
import torch
import os
import pandas as pd
import evaluate

accelerator = Accelerator()
device = accelerator.device

def load_model(model_path, accelerator_device):
    model = AutoModelForSequenceClassification.from_pretrained(
        model_path, 
        problem_type="multi_label_classification", 
        num_labels=len(all_labels),
        id2label=id2label,
        label2id=label2id
    )

    model.to(accelerator_device)
    
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    return model, tokenizer

def predict(model, tokenizer, text, threshold=0.5):
    inputs = tokenizer([text], return_tensors="pt").to(device)
    outputs = model(**inputs)
    probs = torch.nn.Sigmoid()((outputs.logits.squeeze().cpu()))
    predictions = np.zeros(probs.shape)
    predictions[np.where(probs >= threshold)] = 1
    return [id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]

def tokenize_text(examples):
    final_labels = np.zeros(len(all_labels))
    for idx, label in enumerate(all_labels):
        final_labels[idx] = examples[label]
        
    examples["labels"] = final_labels
    return tokenizer(examples["text"], truncation=True, max_length=512)


### データ前処理

all_labels = [
    'Controlled/Regulated Substances',
    'Criminal Planning/Confessions',
    'Deception/Fraud',
    'Guns and Illegal Weapons',
    'Harassment',
    'Hate/Identity Hate',
    'Needs Caution',
    'PII/Privacy',
    'Profanity',
    'Sexual',
    'Sexual (minor)',
    'Suicide and Self Harm',
    'Threat',
    'Violence'
]

id2label = {idx:label for idx, label in enumerate(all_labels)}
label2id = {label:idx for idx, label in enumerate(all_labels)}

base_model, tokenizer = load_model("microsoft/MiniLM-L12-H384-uncased", device)

train_df = pd.read_csv("nvidia_train.csv")
test_df = pd.read_csv("nvidia_test.csv")

dataset = DatasetDict({
    'train': Dataset.from_pandas(train_df),
    'test': Dataset.from_pandas(test_df)}
)

preprocessed_dataset = dataset.map(tokenize_text)


### マルチラベル分類のメトリクス

clf_metrics = evaluate.combine(["accuracy", "f1", "precision", "recall"])

def sigmoid(x):
   return 1/(1 + np.exp(-x))

def compute_metrics(eval_pred):
   predictions, labels = eval_pred
   predictions = sigmoid(predictions)
   predictions = (predictions > 0.5).astype(int).reshape(-1)
   return clf_metrics.compute(predictions=predictions, references=labels.astype(int).reshape(-1))


### ファインチューニング

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

output_dir = f'./minilm_finetuned/minilm-{datetime.now().strftime("%d-%m-%Y_%H-%M")}' # トレーニングチェックポイントが保存される出力ディレクトリ

final_output_dir = './minilm_finetuned' # トレーナーからの最良モデルが保存される場所

training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=2e-5,
    per_device_train_batch_size=3,
    per_device_eval_batch_size=3,
    num_train_epochs=20,
    weight_decay=0.01,
    fp16=True,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=base_model,
    args=training_args,
    train_dataset=preprocessed_dataset["train"],
    eval_dataset=preprocessed_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

print("Saving model...")
trainer.save_model(final_output_dir)


### モデルの評価
base_model, tokenizer = load_model(final_output_dir, device)
predict(base_model, tokenizer, "How to make a bomb?")