t2i-adapter-canny-sdxl-1.0 Open-Source Model - Providing Canny Edge Detection Control for Stable Diffusion XL

T2i Adapter Canny Sdxl 1.0

Developed by TencentARC

An adapter model providing Canny edge detection condition control for Stable Diffusion XL, jointly developed by Tencent ARC and Hugging Face

Image Generation OtherOpen Source License:Apache-2.0 #Canny edge control #SDXL adapter #Image structure generation

Downloads 6,795

Release Time : 9/3/2023

Model Overview

T2I-Adapter is a network that provides additional conditional control for Stable Diffusion. This checkpoint is specifically designed for Canny edge detection condition control in StableDiffusionXL

Model Features

Lightweight adapter

Achieves precise control over SDXL model with only 77M parameters

Edge-preserving generation

Maintains structural features of input images through Canny edge detection

Multi-condition control

Supports combination with other T2I adapters for multi-condition control

Model Capabilities

Edge map-based image generation

Image-to-image translation

Structure-preserving image synthesis

Use Cases

Creative design

Concept art creation

Automatically generates complete artworks from sketches

Exquisite artworks preserving original sketch structures

Image editing

Image style transfer

Changes artistic style while maintaining original image structure

Images with consistent structure but different styles

🚀 T2I-Adapter-SDXL - Canny

T2I Adapter is a network that provides additional conditioning to stable diffusion. Each t2i checkpoint takes a different type of conditioning as input and is used with a specific base stable diffusion checkpoint. This checkpoint offers conditioning on canny for the StableDiffusionXL checkpoint. It was a collaboration between Tencent ARC and Hugging Face.

🚀 Quick Start

To get started, first install the required dependencies:

pip install -U git+https://github.com/huggingface/diffusers.git
pip install -U controlnet_aux==0.0.7 # for conditioning models and detectors  
pip install transformers accelerate safetensors

Download images in the appropriate control image format.
Pass the control image and prompt to the StableDiffusionXLAdapterPipeline.

💻 Usage Examples

Basic Usage

Let's have a look at a simple example using the Canny Adapter.

Dependency

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler, AutoencoderKL
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.canny import CannyDetector
import torch

# load adapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda")

# load euler_a scheduler
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

canny_detector = CannyDetector()

Condition Image

url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg"
image = load_image(url)

# Detect the canny map in low resolution to avoid high-frequency details
image = canny_detector(image, detect_resolution=384, image_resolution=1024)#.resize((1024, 1024))

Generation

prompt = "Mystical fairy in real, magic, 4k picture, high quality"
negative_prompt = "extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured"

gen_images = pipe(
  prompt=prompt,
  negative_prompt=negative_prompt,
  image=image,
  num_inference_steps=30,
  guidance_scale=7.5, 
  adapter_conditioning_scale=0.8, 
  adapter_conditioning_factor=1
).images[0]
gen_images.save('out_canny.png')

📚 Documentation

Model Details

Property	Details
Developed by	T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
Model Type	Diffusion-based text-to-image generation model
Language(s)	English
License	Apache 2.0
Resources for more information	GitHub Repository, Paper.
Model complexity
Cite as	@misc{ title={T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models}, author={Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie}, year={2023}, eprint={2302.08453}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Checkpoints

Model Name	Control Image Overview	Control Image Example	Generated Image Example
TencentARC/t2i-adapter-canny-sdxl-1.0 Trained with canny edge detection	A monochrome image with white edges on a black background.
TencentARC/t2i-adapter-sketch-sdxl-1.0 Trained with PidiNet edge detection	A hand-drawn monochrome image with white outlines on a black background.
TencentARC/t2i-adapter-lineart-sdxl-1.0 Trained with lineart edge detection	A hand-drawn monochrome image with white outlines on a black background.
TencentARC/t2i-adapter-depth-midas-sdxl-1.0 Trained with Midas depth estimation	A grayscale image with black representing deep areas and white representing shallow areas.
TencentARC/t2i-adapter-depth-zoe-sdxl-1.0 Trained with Zoe depth estimation	A grayscale image with black representing deep areas and white representing shallow areas.
TencentARC/t2i-adapter-openpose-sdxl-1.0 Trained with OpenPose bone image	A OpenPose bone image.

Training

Our training script was built on top of the official training script that we provide here.

The model is trained on 3M high-resolution image-text pairs from LAION-Aesthetics V2 with:

Training steps: 20000
Batch size: Data parallel with a single gpu batch size of 16 for a total batch size of 256.
Learning rate: Constant learning rate of 1e-5.
Mixed precision: fp16

📄 License

This model is licensed under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご