ControlNet v1.1 Open-Source Model - Generate Images for Free with Stable Diffusion Using Depth Maps

Control V11f1p Sd15 Depth

Developed by frankjoshua

ControlNet v1.1 is the successor model to ControlNet v1.0, controlling Stable Diffusion image generation through depth image conditions.

Image Generation OtherOpen Source License:Openrail #Depth Map Controlled Generation #Multi-Preprocessor Compatibility #Image Enhancement and Restoration

Downloads 28

Release Time : 7/27/2023

Model Overview

ControlNet is a neural network structure that controls diffusion models by adding additional conditions. This checkpoint corresponds to the ControlNet based on depth image conditions and can be used in conjunction with Stable Diffusion.

Model Features

Depth Condition Control

Uses depth images as conditional input to precisely control the geometric structure and spatial relationships of generated images.

Improved Training Data

Fixed issues in the v1.0 training dataset, reducing bias and improving model robustness.

Data Augmentation

Applied data augmentation techniques such as random left-right flipping to enhance model generalization.

Compatibility with Multiple Depth Estimation Methods

Supports different preprocessor resolutions and multiple depth estimation methods (e.g., Midas, leres, and zoe).

Model Capabilities

Depth map-based image generation

Image-to-image translation

Geometric structure preservation

3D scene generation

Use Cases

Artistic Creation

3D Scene Generation

Generates 3D scene images with correct perspective and spatial relationships based on depth maps.

Generated images maintain the geometric structure of the input depth map.

Architectural Design

Architectural Visualization

Generates detailed architectural renderings from simple depth sketches.

Quickly transforms conceptual designs into realistic images.

🚀 Controlnet - v1.1 - depth Version

Controlnet v1.1 is a powerful model that enhances the capabilities of diffusion models. It allows for conditional control, enabling users to generate images based on specific input conditions, such as depth images. This model is a significant advancement in the field of text-to-image generation, providing more precise control over the output.

🚀 Quick Start

Controlnet v1.1 is the successor model of Controlnet v1.0 and was released in lllyasviel/ControlNet-v1-1 by Lvmin Zhang.

This checkpoint is a conversion of the original checkpoint into diffusers format. It can be used in combination with Stable Diffusion, such as runwayml/stable-diffusion-v1-5.

For more details, please also have a look at the 🧨 Diffusers docs.

ControlNet is a neural network structure to control diffusion models by adding extra conditions.

This checkpoint corresponds to the ControlNet conditioned on depth images.

✨ Features

Conditional Control: ControlNet allows diffusion models to be controlled by adding extra conditions, such as depth images, edge maps, and keypoints.
End-to-End Learning: The model learns task-specific conditions in an end-to-end way, making it robust even with small training datasets.
Fast Training: Training a ControlNet is as fast as fine-tuning a diffusion model and can be done on personal devices.

📦 Installation

Let's install diffusers and related packages:

$ pip install diffusers transformers accelerate

💻 Usage Examples

Basic Usage

import torch
import os
from huggingface_hub import HfApi
from pathlib import Path
from diffusers.utils import load_image
from PIL import Image
import numpy as np
from transformers import pipeline


from diffusers import (
    ControlNetModel,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)

checkpoint = "lllyasviel/control_v11p_sd15_depth"

image = load_image(
    "https://huggingface.co/lllyasviel/control_v11p_sd15_depth/resolve/main/images/input.png"
)

prompt = "Stormtrooper's lecture in beautiful lecture hall"

depth_estimator = pipeline('depth-estimation')
image = depth_estimator(image)['depth']
image = np.array(image)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)

control_image.save("./images/control.png")

controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]

image.save('images/image_out.png')

bird

bird_canny

bird_canny_out

📚 Documentation

Model Details

Property	Details
Developed by	Lvmin Zhang, Maneesh Agrawala
Model Type	Diffusion-based text-to-image generation model
Language(s)	English
License	The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
Resources for more information	GitHub Repository, Paper.
Cite as	@misc{zhang2023adding, title={Adding Conditional Control to Text-to-Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Introduction

Controlnet was proposed in Adding Conditional Control to Text-to-Image Diffusion Models by Lvmin Zhang, Maneesh Agrawala.

The abstract reads as follows:

We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, depthmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.

Other released checkpoints v1-1

The authors released 14 different checkpoints, each trained with Stable Diffusion v1-5 on a different type of conditioning:

Model Name	Control Image Overview	Condition Image
lllyasviel/control_v11p_sd15_canny	Trained with canny edge detection	A monochrome image with white edges on a black background.
lllyasviel/control_v11e_sd15_ip2p	Trained with pixel to pixel instruction	No condition .
lllyasviel/control_v11p_sd15_inpaint	Trained with image inpainting	No condition.
lllyasviel/control_v11p_sd15_mlsd	Trained with multi-level line segment detection	An image with annotated line segments.
lllyasviel/control_v11f1p_sd15_depth	Trained with depth estimation	An image with depth information, usually represented as a grayscale image.
lllyasviel/control_v11p_sd15_normalbae	Trained with surface normal estimation	An image with surface normal information, usually represented as a color-coded image.
lllyasviel/control_v11p_sd15_seg	Trained with image segmentation	An image with segmented regions, usually represented as a color-coded image.
lllyasviel/control_v11p_sd15_lineart	Trained with line art generation	An image with line art, usually black lines on a white background.
lllyasviel/control_v11p_sd15s2_lineart_anime	Trained with anime line art generation	An image with anime-style line art.
lllyasviel/control_v11p_sd15_openpose	Trained with human pose estimation	An image with human poses, usually represented as a set of keypoints or skeletons.
lllyasviel/control_v11p_sd15_scribble	Trained with scribble-based image generation	An image with scribbles, usually random or user-drawn strokes.