ControlNet-openpose-sdxl-1.0オープンソース画像生成モデル - 高精度なポーズ制御で高品質画像を創造

ホーム

Controlnet Openpose Sdxl 1.0

xinsirによって開発

最先端のControlNet-openpose-sdxl-1.0モデル、姿勢制御による高品質画像生成専用に設計

画像生成オープンソースライセンス:Apache-2.0 #SDXL姿勢制御 #高精度人体姿勢生成 #アニメスタイル適応

ダウンロード数 41.49k

リリース時間 : 5/13/2024

モデル概要

このモデルはStable Diffusion XLをベースにしたControlNet拡張で、人体姿勢制御を通じて高品質画像を生成することに特化しており、特にアニメや写実スタイルの画像生成に適しています。

モデル特徴

高精度姿勢制御

改良版Openpose検出器により、より正確な人体姿勢制御を実現

マルチ解像度対応

低解像度から超高解像度(4000px以上)までの画像生成をサポート

スタイル多様性

写実からアニメスタイルまで多様な画像を生成可能

性能最適化

他のオープンソースOpenposeモデルと比較し、mAP指標で優れた性能

モデル能力

テキストプロンプトに基づく画像生成

人体姿勢制御による画像生成

高解像度画像生成

多様なアートスタイル変換

使用事例

アート創作

アニメキャラクターデザイン

指定した姿勢でアニメスタイルのキャラクターを生成

スタイル統一かつ姿勢正確なアニメキャラクターを生成可能

写実的人物シーン

特定の人体姿勢に合致した写実的シーンを生成

自然な人物姿勢で、シーンとの融合度が高い

コンセプトデザイン

キャラクタープロトタイプ設計

多様な姿勢のキャラクタープロトタイプを迅速に生成

設計プロセスを加速し、多様な選択肢を提供

🚀 ControlNet-openpose-sdxl-1.0 モデル

このモデルは最先端のControlNet-openpose-sdxl-1.0モデルで、以下にMidjourneyやアニメの結果を表示しています。

images images

✨ 主な機能

controlnet-openpose-sdxl-1.0

開発者: xinsir
モデルタイプ: ControlNet_SDXL
ライセンス: apache-2.0
ファインチューニング元のモデル [オプション]: stabilityai/stable-diffusion-xl-base-1.0

モデルソース [オプション]

論文 [オプション]: https://arxiv.org/abs/2302.05543

モデルの例

images10 images20 images30 images40 images50 images60 images70 images80 images90 images99

images0 images1 images2 images3 images4 images5 images6 images7 images8 images9

💻 使用例

基本的な使用法

デフォルトのポーズ描画関数を置き換えることで、より良い結果を得ることができます。feiyuuuに問題を報告してくれたことに感謝します。デフォルトのポーズラインを使用すると、パフォーマンスが不安定になる場合があります。これは、ポーズラベルがトレーニング時に太いラインを使用しているためです。この違いは、以下の方法で修正できます。

controlnet_aux Pythonパッケージのutil.pyを見つけます。通常、パスは次のようになります。/your anaconda3 path/envs/your env name/lib/python3.8/site-packages/controlnet_aux/open_pose/util.py

draw_bodypose関数を以下のコードに置き換えます。

def draw_bodypose(canvas: np.ndarray, keypoints: List[Keypoint]) -> np.ndarray:
    """
    Draw keypoints and limbs representing body pose on a given canvas.

    Args:
        canvas (np.ndarray): A 3D numpy array representing the canvas (image) on which to draw the body pose.
        keypoints (List[Keypoint]): A list of Keypoint objects representing the body keypoints to be drawn.

    Returns:
        np.ndarray: A 3D numpy array representing the modified canvas with the drawn body pose.

    Note:
        The function expects the x and y coordinates of the keypoints to be normalized between 0 and 1.
    """
    H, W, C = canvas.shape

    
    if max(W, H) < 500:
        ratio = 1.0
    elif max(W, H) >= 500 and max(W, H) < 1000:
        ratio = 2.0
    elif max(W, H) >= 1000 and max(W, H) < 2000:
        ratio = 3.0
    elif max(W, H) >= 2000 and max(W, H) < 3000:
        ratio = 4.0
    elif max(W, H) >= 3000 and max(W, H) < 4000:
        ratio = 5.0
    elif max(W, H) >= 4000 and max(W, H) < 5000:
        ratio = 6.0
    else:
        ratio = 7.0

    stickwidth = 4

    limbSeq = [
        [2, 3], [2, 6], [3, 4], [4, 5], 
        [6, 7], [7, 8], [2, 9], [9, 10], 
        [10, 11], [2, 12], [12, 13], [13, 14], 
        [2, 1], [1, 15], [15, 17], [1, 16], 
        [16, 18],
    ]

    colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
              [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
              [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]

    for (k1_index, k2_index), color in zip(limbSeq, colors):
        keypoint1 = keypoints[k1_index - 1]
        keypoint2 = keypoints[k2_index - 1]

        if keypoint1 is None or keypoint2 is None:
            continue

        Y = np.array([keypoint1.x, keypoint2.x]) * float(W)
        X = np.array([keypoint1.y, keypoint2.y]) * float(H)
        mX = np.mean(X)
        mY = np.mean(Y)
        length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
        angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
        polygon = cv2.ellipse2Poly((int(mY), int(mX)), (int(length / 2), int(stickwidth * ratio)), int(angle), 0, 360, 1)
        cv2.fillConvexPoly(canvas, polygon, [int(float(c) * 0.6) for c in color])

    for keypoint, color in zip(keypoints, colors):
        if keypoint is None:
            continue

        x, y = keypoint.x, keypoint.y
        x = int(x * W)
        y = int(y * H)
        cv2.circle(canvas, (int(x), int(y)), int(4 * ratio), color, thickness=-1)

    return canvas

高度な使用法

以下のコードを使用して、モデルを開始します。

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from controlnet_aux import OpenposeDetector
from PIL import Image
import torch
import numpy as np
import cv2



controlnet_conditioning_scale = 1.0  
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'



eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")


controlnet = ControlNetModel.from_pretrained(
    "xinsir/controlnet-openpose-sdxl-1.0",
    torch_dtype=torch.float16
)

# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)


pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    safety_checker=None,
    torch_dtype=torch.float16,
    scheduler=eulera_scheduler,
)

processor = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')


controlnet_img = cv2.imread("your image path")
controlnet_img = processor(controlnet_img, hand_and_face=False, output_type='cv2')


# need to resize the image resolution to 1024 * 1024 or same bucket resolution to get the best performance
height, width, _  = controlnet_img.shape
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = cv2.resize(controlnet_img, (new_width, new_height))
controlnet_img = Image.fromarray(controlnet_img)

images = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=controlnet_img,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    width=new_width,
    height=new_height,
    num_inference_steps=30,
    ).images

images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")