nsfw_image_detectionオープンソースモデル - 通常の内容とNSFW内容を正確に識別し、健全な視覚環境を守ります

ホーム

Nsfw Image Detection

Falconsaiによって開発

ViTアーキテクチャに基づくNSFW画像分類モデル。ImageNet-21kデータセットで事前学習し、80,000枚の画像でファインチューニングされ、通常コンテンツとNSFWコンテンツを区別します。

画像分類

Transformers

オープンソースライセンス:Apache-2.0 #NSFW画像検出 #高精度分類 #コンテンツ審査

ダウンロード数 82.4M

リリース時間 : 10/13/2023

モデル概要

このモデルは主にNSFW（職場に不適切な）画像の分類に使用され、様々なアプリケーションにおける露骨または不適切なコンテンツのフィルタリングに適しています。

モデル特徴

高性能分類

評価データセットで98.04%の精度を達成し、通常コンテンツとNSFWコンテンツを効果的に区別できます。

ViTアーキテクチャ採用

Vision Transformerアーキテクチャを採用し、画像処理におけるTransformerの利点を組み合わせています。

大規模データ学習

80,000枚の画像からなる独自データセットでファインチューニングされ、高度に多様なコンテンツを網羅しています。

モデル能力

NSFW画像分類

画像コンテンツ認識

センシティブコンテンツフィルタリング

使用事例

コンテンツ審査

ソーシャルメディアコンテンツフィルタリング

ソーシャルメディアプラットフォーム上の不適切なコンテンツを自動検出・フィルタリングします。

手動審査の作業量を効果的に削減し、コンテンツ安全基準を向上させます。

職場コンテンツ管理

企業内システムで使用し、職場でのNSFWコンテンツの拡散を防止します。

専門的な職場環境を維持し、法的リスクを低減します。

🚀 ファインチューニングされたビジョントランスフォーマー（ViT）による不適切画像分類モデル

このモデルは、Transformerエンコーダアーキテクチャをベースにした画像分類モデルです。特定のタスクに合わせてファインチューニングされ、不適切画像の分類に特化しています。

🚀 クイックスタート

このモデルは、不適切画像の分類に最適化されています。以下に、画像を分類するための使用方法を示します。

モデルを使用した画像分類の例

# Use a pipeline as a high-level helper
from PIL import Image
from transformers import pipeline

img = Image.open("<path_to_image_file>")
classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")
classifier(img)

モデルを直接ロードする例

# Load model directly
import torch
from PIL import Image
from transformers import AutoModelForImageClassification, ViTImageProcessor

img = Image.open("<path_to_image_file>")
model = AutoModelForImageClassification.from_pretrained("Falconsai/nsfw_image_detection")
processor = ViTImageProcessor.from_pretrained('Falconsai/nsfw_image_detection')
with torch.no_grad():
    inputs = processor(images=img, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits

predicted_label = logits.argmax(-1).item()
model.config.id2label[predicted_label]

YOLOバージョンの実行例

import os
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import onnxruntime as ort
import json # Added import for json

# Predict using YOLOv9 model
def predict_with_yolov9(image_path, model_path, labels_path, input_size):
    """
    Run inference using the converted YOLOv9 model on a single image.

    Args:
        image_path (str): Path to the input image file.
        model_path (str): Path to the ONNX model file.
        labels_path (str): Path to the JSON file containing class labels.
        input_size (tuple): The expected input size (height, width) for the model.

    Returns:
        str: The predicted class label.
        PIL.Image.Image: The original loaded image.
    """
    def load_json(file_path):
        with open(file_path, "r") as f:
            return json.load(f)

    # Load labels
    labels = load_json(labels_path)

    # Preprocess image
    original_image = Image.open(image_path).convert("RGB")
    image_resized = original_image.resize(input_size, Image.Resampling.BILINEAR)
    image_np = np.array(image_resized, dtype=np.float32) / 255.0
    image_np = np.transpose(image_np, (2, 0, 1))  # [C, H, W]
    input_tensor = np.expand_dims(image_np, axis=0).astype(np.float32)

    # Load YOLOv9 model
    session = ort.InferenceSession(model_path)
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name # Assuming classification output

    # Run inference
    outputs = session.run([output_name], {input_name: input_tensor})
    predictions = outputs[0]

    # Postprocess predictions (assuming classification output)
    # Adapt this section if your model output is different (e.g., detection boxes)
    predicted_index = np.argmax(predictions)
    predicted_label = labels[str(predicted_index)] # Assumes labels are indexed by string numbers

    return predicted_label, original_image

# Display prediction for a single image
def display_single_prediction(image_path, model_path, labels_path, input_size):
    """
    Predicts the class for a single image and displays the image with its prediction.

    Args:
        image_path (str): Path to the input image file.
        model_path (str): Path to the ONNX model file.
        labels_path (str): Path to the JSON file containing class labels.
        input_size (tuple): The expected input size (height, width) for the model.
    """
    try:
        # Run prediction
        prediction, img = predict_with_yolov9(image_path, model_path, labels_path, input_size)

        # Display image and prediction
        fig, ax = plt.subplots(1, 1, figsize=(8, 8)) # Create a single plot
        ax.imshow(img)
        ax.set_title(f"Prediction: {prediction}", fontsize=14)
        ax.axis("off") # Hide axes ticks and labels

        plt.tight_layout()
        plt.show()

    except FileNotFoundError:
        print(f"Error: Image file not found at {image_path}")
    except Exception as e:
        print(f"An error occurred: {e}")


# --- Main Execution ---

# Paths and parameters - **MODIFY THESE**
single_image_path = "path/to/your/single_image.jpg"  # <--- Replace with the actual path to your image file
model_path = "path/to/your/yolov9_model.onnx"    # <--- Replace with the actual path to your ONNX model
labels_path = "path/to/your/labels.json"        # <--- Replace with the actual path to your labels JSON file
input_size = (224, 224)                         # Standard input size, adjust if your model differs

# Check if the image file exists before proceeding (optional but recommended)
if os.path.exists(single_image_path):
    # Run prediction and display for the single image
    display_single_prediction(single_image_path, model_path, labels_path, input_size)
else:
    print(f"Error: The specified image file does not exist: {single_image_path}")

✨ 主な機能

不適切画像分類：このモデルは、不適切画像（NSFW）の分類に特化しています。様々なアプリケーションで明示的または不適切なコンテンツをフィルタリングするのに適しています。

🔧 技術詳細

モデルの説明

ファインチューニングされたビジョントランスフォーマー（ViT） は、BERTに似たTransformerエンコーダアーキテクチャの変種で、画像分類タスクに適応されています。この特定のモデル「google/vit-base-patch16-224-in21k」は、ImageNet-21kデータセットを利用して、大量の画像で教師あり学習により事前学習されています。事前学習データセットの画像は224x224ピクセルの解像度にリサイズされており、幅広い画像認識タスクに適しています。

学習段階では、ハイパーパラメータの設定に細心の注意が払われ、最適なモデル性能を確保しました。モデルは、慎重に選択されたバッチサイズ16でファインチューニングされました。この選択は、計算効率をバランスさせるだけでなく、モデルが多様な画像を効果的に処理し学習することを可能にしました。

このファインチューニングプロセスでは、学習率5e-5が使用されました。学習率は、学習中にモデルのパラメータに対して行われる調整の大きさを決定する重要なチューニングパラメータです。この場合、学習率5e-5が選択され、急速な収束と安定した最適化のバランスを達成しました。その結果、モデルは迅速に学習し、学習プロセス全体を通じてその能力を着実に洗練させました。

この学習段階は、約80,000枚の画像を含む独自のデータセットを使用して実行されました。このデータセットは、「通常」と「不適切」の2つの異なるクラスから構成されています。この多様性により、モデルは微妙な視覚パターンを把握し、安全なコンテンツと明示的なコンテンツを正確に区別する能力を備えました。

この緻密な学習プロセスの主な目的は、モデルに視覚的な手がかりを深く理解させ、不適切画像分類という特定のタスクを処理するための堅牢性と能力を確保することでした。その結果、精度と信頼性の最高基準を維持しながら、コンテンツの安全性とモデレーションに大きく貢献できるモデルが完成しました。

学習データ

モデルの学習データには、約80,000枚の画像を含む独自のデータセットが含まれています。このデータセットは、非常に多様性に富み、「通常」と「不適切」の2つの異なるクラスから構成されています。このデータでの学習プロセスは、モデルが安全なコンテンツと明示的なコンテンツを効果的に区別する能力を備えることを目的としています。

学習統計

- 'eval_loss': 0.07463177293539047,
- 'eval_accuracy': 0.980375, 
- 'eval_runtime': 304.9846, 
- 'eval_samples_per_second': 52.462, 
- 'eval_steps_per_second': 3.279