nsfw_image_detection開源模型 - 精準區分正常與NSFW內容，守護健康視覺環境

首頁

Nsfw Image Detection

由Falconsai開發

基於ViT架構的NSFW圖像分類模型，通過監督學習在ImageNet-21k數據集上預訓練，並在80,000張圖像上微調，用於區分正常和NSFW內容。

圖像分類

Transformers

開源協議:Apache-2.0 #NSFW圖像檢測 #高精度分類 #內容審核

下載量 82.4M

發布時間 : 10/13/2023

模型概述

該模型主要用於對NSFW（不適合工作場所）圖像進行分類，適用於過濾各種應用中的顯式或不適當內容。

模型特點

高性能分類

在評估集上達到98.04%的準確率，能夠有效區分正常和NSFW內容。

基於ViT架構

採用Vision Transformer架構，結合了Transformer在圖像處理中的優勢。

大規模數據訓練

在80,000張圖像的專有數據集上微調，涵蓋高度多樣性的內容。

模型能力

NSFW圖像分類

圖像內容識別

敏感內容過濾

使用案例

內容審核

社交媒體內容過濾

自動檢測和過濾社交媒體平臺上的不適當內容。

有效減少人工審核工作量，提高內容安全標準。

工作場所內容管理

用於企業內部系統，防止NSFW內容在工作場所傳播。

維護專業工作環境，降低法律風險。

🚀 微調視覺變換器（ViT）用於NSFW圖像分類模型卡片

本模型為微調後的視覺變換器（ViT），專門用於NSFW（不適宜工作場景）圖像分類。它基於Transformer架構，在圖像分類任務中表現出色，能有效識別正常和不適宜內容的圖像，為內容安全和審核提供有力支持。

🚀 快速開始

模型用途

本模型主要用於NSFW圖像分類，可在各類應用中過濾露骨或不適當的內容。

使用方法

以下是使用該模型對圖像進行分類的示例代碼：

基礎用法

# 使用pipeline進行圖像分類
from PIL import Image
from transformers import pipeline

img = Image.open("<path_to_image_file>")
classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")
classifier(img)

高級用法

# 直接加載模型進行圖像分類
import torch
from PIL import Image
from transformers import AutoModelForImageClassification, ViTImageProcessor

img = Image.open("<path_to_image_file>")
model = AutoModelForImageClassification.from_pretrained("Falconsai/nsfw_image_detection")
processor = ViTImageProcessor.from_pretrained('Falconsai/nsfw_image_detection')
with torch.no_grad():
    inputs = processor(images=img, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits

predicted_label = logits.argmax(-1).item()
model.config.id2label[predicted_label]

YOLO版本使用方法

import os
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import onnxruntime as ort
import json # Added import for json

# Predict using YOLOv9 model
def predict_with_yolov9(image_path, model_path, labels_path, input_size):
    """
    Run inference using the converted YOLOv9 model on a single image.

    Args:
        image_path (str): Path to the input image file.
        model_path (str): Path to the ONNX model file.
        labels_path (str): Path to the JSON file containing class labels.
        input_size (tuple): The expected input size (height, width) for the model.

    Returns:
        str: The predicted class label.
        PIL.Image.Image: The original loaded image.
    """
    def load_json(file_path):
        with open(file_path, "r") as f:
            return json.load(f)

    # Load labels
    labels = load_json(labels_path)

    # Preprocess image
    original_image = Image.open(image_path).convert("RGB")
    image_resized = original_image.resize(input_size, Image.Resampling.BILINEAR)
    image_np = np.array(image_resized, dtype=np.float32) / 255.0
    image_np = np.transpose(image_np, (2, 0, 1))  # [C, H, W]
    input_tensor = np.expand_dims(image_np, axis=0).astype(np.float32)

    # Load YOLOv9 model
    session = ort.InferenceSession(model_path)
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name # Assuming classification output

    # Run inference
    outputs = session.run([output_name], {input_name: input_tensor})
    predictions = outputs[0]

    # Postprocess predictions (assuming classification output)
    # Adapt this section if your model output is different (e.g., detection boxes)
    predicted_index = np.argmax(predictions)
    predicted_label = labels[str(predicted_index)] # Assumes labels are indexed by string numbers

    return predicted_label, original_image

# Display prediction for a single image
def display_single_prediction(image_path, model_path, labels_path, input_size):
    """
    Predicts the class for a single image and displays the image with its prediction.

    Args:
        image_path (str): Path to the input image file.
        model_path (str): Path to the ONNX model file.
        labels_path (str): Path to the JSON file containing class labels.
        input_size (tuple): The expected input size (height, width) for the model.
    """
    try:
        # Run prediction
        prediction, img = predict_with_yolov9(image_path, model_path, labels_path, input_size)

        # Display image and prediction
        fig, ax = plt.subplots(1, 1, figsize=(8, 8)) # Create a single plot
        ax.imshow(img)
        ax.set_title(f"Prediction: {prediction}", fontsize=14)
        ax.axis("off") # Hide axes ticks and labels

        plt.tight_layout()
        plt.show()

    except FileNotFoundError:
        print(f"Error: Image file not found at {image_path}")
    except Exception as e:
        print(f"An error occurred: {e}")


# --- Main Execution ---

# Paths and parameters - **MODIFY THESE**
single_image_path = "path/to/your/single_image.jpg"  # <--- Replace with the actual path to your image file
model_path = "path/to/your/yolov9_model.onnx"    # <--- Replace with the actual path to your ONNX model
labels_path = "path/to/your/labels.json"        # <--- Replace with the actual path to your labels JSON file
input_size = (224, 224)                         # Standard input size, adjust if your model differs

# Check if the image file exists before proceeding (optional but recommended)
if os.path.exists(single_image_path):
    # Run prediction and display for the single image
    display_single_prediction(single_image_path, model_path, labels_path, input_size)
else:
    print(f"Error: The specified image file does not exist: {single_image_path}")

✨ 主要特性

針對性微調：針對NSFW圖像分類任務進行了精細微調，能準確區分正常和不適宜內容的圖像。
高性能表現：在訓練數據上取得了高準確率，評估準確率達到0.980375。

📦 安裝指南

文檔未提及安裝相關內容，如需使用可參考Hugging Face模型庫的通用安裝方法。

📚 詳細文檔

模型描述

微調視覺變換器（ViT）是Transformer編碼器架構的變體，類似於BERT，適用於圖像分類任務。本模型名為“google/vit-base-patch16-224-in21k”，在ImageNet - 21k數據集上進行了有監督的預訓練。預訓練時，圖像被調整為224x224像素的分辨率，適用於廣泛的圖像識別任務。

在訓練階段，對超參數設置進行了精心調整，以確保模型性能達到最佳。模型以16的批量大小進行微調，既保證了計算效率，又能讓模型從多樣化的圖像中有效學習。學習率設置為5e - 5，在快速收斂和穩定優化之間取得了平衡，使模型在訓練過程中既能快速學習，又能穩步提升性能。

訓練使用了包含約80,000張圖像的專有數據集，該數據集具有高度的可變性，包含“正常”和“nsfw”兩個類別。通過在該數據集上的訓練，模型能夠掌握細微的視覺模式，準確區分安全和露骨內容。

預期用途與侷限性

預期用途

NSFW圖像分類：主要用於NSFW圖像分類，可在各類應用中過濾露骨或不適當的內容。

侷限性

任務專業性：模型在NSFW圖像分類任務上表現出色，但在其他任務上的性能可能有所不同。若要將其用於其他任務，建議在模型庫中探索微調版本以獲得最佳效果。

訓練數據

模型的訓練數據包含一個專有數據集，約有80,000張圖像，具有高度的可變性，分為“正常”和“nsfw”兩個類別。訓練的目標是讓模型能夠有效區分安全和露骨內容。

訓練統計信息

- 'eval_loss': 0.07463177293539047,
- 'eval_accuracy': 0.980375, 
- 'eval_runtime': 304.9846, 
- 'eval_samples_per_second': 52.462, 
- 'eval_steps_per_second': 3.279