wd-swinv2-tagger-v3-hfオープンソース画像タグ分類モデル

ホーム

Wd Swinv2 Tagger V3 Hf

p1atdevによって開発

SwinV2アーキテクチャに基づく画像タグ分類モデルで、画像内のコンテンツや特徴を自動識別

画像分類

Transformers

オープンソースライセンス:Apache-2.0 #アニメ画像タグ付け #マルチラベル分類 #SwinV2アーキテクチャ

ダウンロード数 8,969

リリース時間 : 3/8/2024

モデル概要

これはSwinV2アーキテクチャに基づく画像分類モデルで、オブジェクト、シーン、キャラクターなど画像内の様々な要素や特徴を自動識別し、対応するタグを出力します。特にアニメ/二次元画像分析に適しています。

モデル特徴

高精度画像タグ識別

キャラクター、衣装、シーンなど画像内の様々な要素を正確に識別可能

レーティングとキャラクタータグ対応

レーティングタグ(rating:)とキャラクタータグ(character:)を提供し、コンテンツ分類を容易に

Optimum加速対応

ONNX Runtimeによる加速が可能で、速度30%向上、モデルサイズ50%削減

モデル能力

画像コンテンツ分析

自動タグ生成

アニメ画像認識

キャラクター特徴識別

使用事例

コンテンツ管理

画像自動タグ付け

画像ライブラリ内のコンテンツに対して自動的に記述タグを生成

画像検索と整理効率の向上

アニメ分析

キャラクター識別

アニメ画像内の特定キャラクターを識別

ファンコンテンツ整理や著作権管理に活用可能

🚀 WD SwinV2 Tagger v3 with 🤗 transformers

SmilingWolf/wd-swinv2-tagger-v3 から 🤗 transformers ライブラリ形式に変換されました。このモデルは画像分類タスクに特化しており、画像に対して様々なタグを付与することができます。

🚀 クイックスタート

このセクションでは、WD SwinV2 Tagger v3 を使用するための基本的な手順を説明します。まずはインストールから始めましょう。

📦 インストール

pip install transformers

💻 使用例

基本的な使用法

from transformers import pipeline

pipe = pipeline(
    "image-classification",
    model="p1atdev/wd-swinv2-tagger-v3-hf",
    trust_remote_code=True,
)

print(pipe("sample.webp", top_k=15))
#[{'label': '1girl', 'score': 0.9973934888839722},
# {'label': 'solo', 'score': 0.9719744324684143},
# {'label': 'dress', 'score': 0.9539461135864258},
# {'label': 'hat', 'score': 0.9511678218841553},
# {'label': 'outdoors', 'score': 0.9438753128051758},
# ...

高度な使用法

from PIL import Image

import numpy as np
import torch

from transformers import (
    AutoImageProcessor,
    AutoModelForImageClassification,
)

MODEL_NAME = "p1atdev/wd-swinv2-tagger-v3-hf"

model = AutoModelForImageClassification.from_pretrained(
    MODEL_NAME,
)
processor = AutoImageProcessor.from_pretrained(MODEL_NAME, trust_remote_code=True)

image = Image.open("sample.webp")
inputs = processor.preprocess(image, return_tensors="pt")

with torch.no_grad():
  outputs = model(**inputs.to(model.device, model.dtype))
logits = torch.sigmoid(outputs.logits[0]) # take the first logits

# get probabilities
results = {model.config.id2label[i]: logit.float() for i, logit in enumerate(logits)}
results = {
    k: v for k, v in sorted(results.items(), key=lambda item: item[1], reverse=True) if v > 0.35 # 35% threshold
}
print(results)  # rating tags and character tags are also included
#{'1girl': tensor(0.9974),
# 'solo': tensor(0.9720),
# 'dress': tensor(0.9539),
# 'hat': tensor(0.9512),
# 'outdoors': tensor(0.9439),
# ...

🚀 Optimumを使った高速化

🤗 Optimum を使用すると、transformers バージョンよりも約30%高速で、モデルサイズが約50%軽量になる可能性がありますが、精度は若干低下します。

pip install optimum[onnxruntime]

-from transformers import pipeline
+from optimum.pipelines import pipeline

pipe = pipeline(
    "image-classification",
    model="p1atdev/wd-swinv2-tagger-v3-hf",
    trust_remote_code=True,
)

print(pipe("sample.webp", top_k=15))
#[{'label': '1girl', 'score': 0.9966088533401489},
# {'label': 'solo', 'score': 0.9740601778030396},
# {'label': 'dress', 'score': 0.9618403911590576},
# {'label': 'hat', 'score': 0.9563733339309692},
# {'label': 'outdoors', 'score': 0.945336639881134},
# ...