đ T2I-Adapter-SDXL - Depth-Zoe
T2I Adapter is a network that provides additional conditioning for Stable Diffusion. Each T2I checkpoint takes a different type of conditioning as input and is used with a specific base Stable Diffusion checkpoint. This checkpoint offers depth conditioning for the StableDiffusionXL checkpoint. It is a collaborative effort between Tencent ARC and Hugging Face.
đ Quick Start
To get started, first install the required dependencies:
pip install -U git+https://github.com/huggingface/diffusers.git
pip install -U controlnet_aux==0.0.7 timm==0.6.12
pip install transformers accelerate safetensors
- Download images in the appropriate control image format.
- Pass the control image and prompt to the
StableDiffusionXLAdapterPipeline
.
⨠Features
- Additional Conditioning: T2I Adapter provides extra conditioning to Stable Diffusion, enabling more controllable image generation.
- Depth Conditioning: This checkpoint offers depth conditioning for the StableDiffusionXL checkpoint.
- Multiple Checkpoints: There are multiple checkpoints available, each taking a different type of conditioning as input.
đĻ Installation
The installation steps are included in the Quick Start section.
đģ Usage Examples
Basic Usage
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler, AutoencoderKL
from diffusers.utils import load_image, make_image_grid
from controlnet_aux import ZoeDetector
import torch
adapter = T2IAdapter.from_pretrained(
"TencentARC/t2i-adapter-depth-zoe-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16",
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()
zoe_depth = ZoeDetector.from_pretrained(
"valhalla/t2iadapter-aux-models", filename="zoed_nk.pth", model_type="zoedepth_nk"
).to("cuda")
Advanced Usage
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_zeo.jpg"
image = load_image(url)
image = zoe_depth(image, gamma_corrected=True, detect_resolution=512, image_resolution=1024)
prompt = "A photo of a orchid, 4k photo, highly detailed"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=image,
num_inference_steps=30,
adapter_conditioning_scale=1,
guidance_scale=7.5,
).images[0]
gen_images.save('out_zoe.png')
đ Documentation
Model Details
Property |
Details |
Developed by |
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models |
Model Type |
Diffusion-based text-to-image generation model |
Language(s) |
English |
License |
Apache 2.0 |
Resources for more information |
GitHub Repository, Paper. |
Model complexity |
|
Cite as |
@misc{ title={T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models}, author={Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie}, year={2023}, eprint={2302.08453}, archivePrefix={arXiv}, primaryClass={cs.CV} } |
Checkpoints
Training
Our training script was built on top of the official training script that we provide here.
The model is trained on 3M high-resolution image-text pairs from LAION-Aesthetics V2 with:
- Training steps: 25000
- Batch size: Data parallel with a single gpu batch size of
16
for a total batch size of 256
.
- Learning rate: Constant learning rate of
1e-5
.
- Mixed precision: fp16
đ License
This model is licensed under the Apache 2.0 license.