t2i-adapter-openpose-sdxl-1.0 Open Source Model - Provide Pose Condition Control for StableDiffusionXL

T2i Adapter Openpose Sdxl 1.0

Developed by TencentARC

T2I Adapter is a network that provides additional conditional control for stable diffusion models. This checkpoint specifically provides conditional control based on Openpose poses for the StableDiffusionXL model.

Image Generation OtherOpen Source License:Apache-2.0 #Pose control generation #SDXL adapter #Skeleton map to image conversion

Downloads 4,179

Release Time : 9/7/2023

Model Overview

This model uses the Openpose skeleton map as conditional input to achieve precise control of the poses in the images generated by StableDiffusionXL, and is suitable for image generation scenarios that require specific human poses.

Model Features

Precise pose control

Precisely control the poses of characters in the generated images through the Openpose skeleton map

Lightweight adapter

An adapter network with only 79M parameters, no need to modify the original SDXL model

Multi-condition support

Can be used in combination with text prompts to achieve multi-condition controlled generation

Model Capabilities

Pose-conditioned image generation

Text-to-image conversion

Human pose preservation

Use Cases

Artistic creation

Character design

Generate images in different styles while maintaining the specific pose of the character

Art style conversion while keeping the bone structure unchanged

Animation storyboard

Quickly generate character poses that meet the requirements of the storyboard

Precisely controlled multi-angle character generation

Commercial applications

Clothing display

Generate renderings of models with different body types wearing clothes

Clothing rendering while maintaining the standard display pose

🚀 T2I-Adapter-SDXL - Openpose

T2I Adapter is a network that provides additional conditioning to stable diffusion. Each T2I checkpoint takes a different type of conditioning as input and is used with a specific base stable diffusion checkpoint.

This checkpoint offers conditioning on OpenPose for the StableDiffusionXL checkpoint. It was a collaborative effort between Tencent ARC and Hugging Face.

✨ Features

Enables additional conditioning for Stable Diffusion models.
Supports various types of conditioning inputs.
Collaboratively developed by Tencent ARC and Hugging Face.

📦 Installation

To get started, first install the required dependencies:

pip install -U git+https://github.com/huggingface/diffusers.git
pip install -U controlnet_aux==0.0.7 # for conditioning models and detectors
pip install transformers accelerate safetensors

💻 Usage Examples

Basic Usage

Images are first downloaded into the appropriate control image format.
The control image and prompt are passed to the StableDiffusionXLAdapterPipeline.

Let's have a look at a simple example using the Openpose Adapter.

# Dependency
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler, AutoencoderKL
from diffusers.utils import load_image, make_image_grid
from controlnet_aux import OpenposeDetector
import torch
import numpy as np
from PIL import Image

# load adapter
adapter = T2IAdapter.from_pretrained(
  "TencentARC/t2i-adapter-openpose-sdxl-1.0", torch_dtype=torch.float16
).to("cuda")

# load euler_a scheduler
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")

# Condition Image
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg"
image = load_image(url)
image = open_pose(image, detect_resolution=512, image_resolution=1024)
image = np.array(image)[:, :, ::-1]           
image = Image.fromarray(np.uint8(image)) 

# Generation
prompt = "A couple, 4k photo, highly detailed"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"

gen_images = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  image=image,
  num_inference_steps=30,
  adapter_conditioning_scale=1,
  guidance_scale=7.5,  
).images[0]
gen_images.save('out_pose.png')

Advanced Usage

The model is trained on 3M high - resolution image - text pairs from LAION - Aesthetics V2 with specific training parameters. You can refer to the training section for more details.

📚 Documentation

Model Details

Property	Details
Developed by	T2I - Adapter: Learning Adapters to Dig out More Controllable Ability for Text - to - Image Diffusion Models
Model Type	Diffusion - based text - to - image generation model
Language(s)	English
License	Apache 2.0
Resources for more information	GitHub Repository, Paper.
Model complexity
Cite as	@misc{ title={T2I - Adapter: Learning Adapters to Dig out More Controllable Ability for Text - to - Image Diffusion Models}, author={Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie}, year={2023}, eprint={2302.08453}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Checkpoints

Model Name	Control Image Overview	Control Image Example	Generated Image Example
[TencentARC/t2i - adapter - canny - sdxl - 1.0](https://huggingface.co/TencentARC/t2i - adapter - canny - sdxl - 1.0) Trained with canny edge detection	A monochrome image with white edges on a black background.
[TencentARC/t2i - adapter - sketch - sdxl - 1.0](https://huggingface.co/TencentARC/t2i - adapter - sketch - sdxl - 1.0) Trained with PidiNet edge detection	A hand - drawn monochrome image with white outlines on a black background.
[TencentARC/t2i - adapter - lineart - sdxl - 1.0](https://huggingface.co/TencentARC/t2i - adapter - lineart - sdxl - 1.0) Trained with lineart edge detection	A hand - drawn monochrome image with white outlines on a black background.
[TencentARC/t2i - adapter - depth - midas - sdxl - 1.0](https://huggingface.co/TencentARC/t2i - adapter - depth - midas - sdxl - 1.0) Trained with Midas depth estimation	A grayscale image with black representing deep areas and white representing shallow areas.
[TencentARC/t2i - adapter - depth - zoe - sdxl - 1.0](https://huggingface.co/TencentARC/t2i - adapter - depth - zoe - sdxl - 1.0) Trained with Zoe depth estimation	A grayscale image with black representing deep areas and white representing shallow areas.
[TencentARC/t2i - adapter - openpose - sdxl - 1.0](https://huggingface.co/TencentARC/t2i - adapter - openpose - sdxl - 1.0) Trained with OpenPose bone image	A [OpenPose bone](https://github.com/CMU - Perceptual - Computing - Lab/openpose) image.

Training

Our training script was built on top of the official training script that we provide here.

The model is trained on 3M high - resolution image - text pairs from LAION - Aesthetics V2 with:

Training steps: 35000
Batch size: Data parallel with a single gpu batch size of 16 for a total batch size of 256.
Learning rate: Constant learning rate of 1e - 5.
Mixed precision: fp16

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご