ControlNet v1.1 Open-source Image Control Model - Free Deployment of Image Generation Based on Image Condition Control

Control V11e Sd15 Ip2p

Developed by lllyasviel

ControlNet v1.1 is a neural network structure developed by Lvmin Zhang, designed to control large pre-trained diffusion models through additional conditions. This version is trained based on instruct pix2pix image conditions.

Image Generation OtherOpen Source License:Openrail #Image Instruction Editing #Pixel-level Control #Stable Diffusion Enhancement

Downloads 3,603

Release Time : 4/14/2023

Model Overview

ControlNet is a neural network structure that controls diffusion models by adding extra conditions. This checkpoint corresponds to ControlNet based on instruct pix2pix image conditions and can be used in conjunction with Stable Diffusion.

Model Features

Conditional Control

Capable of controlling large pre-trained diffusion models through additional input conditions.

Efficient Training

Robust learning even on small datasets (<50k), with training speed comparable to fine-tuning diffusion models.

Flexible Application

Can be trained on personal devices or scaled up for large-scale data training.

Multi-Condition Support

Supports various conditional inputs such as edge maps, segmentation maps, and keypoints.

Model Capabilities

Image-to-Image Translation

Instruction-Based Image Editing

Conditional Image Generation

Use Cases

Creative Design

Image Style Transfer

Modify image styles based on text instructions.

For example, converting a regular photo into a flame effect.

Image Content Editing

Add or modify elements in an image based on instructions.

Artistic Creation

Artistic Effect Generation

Generate images with specific artistic styles.

🚀 Controlnet - v1.1 - instruct pix2pix Version

ControlNet is a neural network structure that enables the control of diffusion models by incorporating additional conditions. This specific checkpoint, converted from the original into the diffusers format, can be used in conjunction with Stable Diffusion to generate images based on various conditional inputs.

🚀 Quick Start

Installation

Let's install diffusers and related packages:

$ pip install diffusers transformers accelerate

Usage

Run code:

import torch
import os
from huggingface_hub import HfApi
from pathlib import Path
from diffusers.utils import load_image
from PIL import Image
import numpy as np

from diffusers import (
    ControlNetModel,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)

checkpoint = "lllyasviel/control_v11e_sd15_ip2p"

image = load_image("https://huggingface.co/lllyasviel/control_v11e_sd15_ip2p/resolve/main/images/input.png").convert('RGB')

prompt = "make it on fire"

controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=30, generator=generator, image=image).images[0]

image.save('images/image_out.png')

bird

bird_canny_out

✨ Features

Conditional Control: ControlNet allows diffusion models like Stable Diffusion to accept conditional inputs such as edge maps, segmentation maps, keypoints, etc., enriching the control methods for large diffusion models.
Fast Training: Training a ControlNet is as fast as fine - tuning a diffusion model and can be done on personal devices with small datasets or scaled to large amounts of data with powerful computation clusters.

📚 Documentation

Model Details

Property	Details
Developed by	Lvmin Zhang, Maneesh Agrawala
Model Type	Diffusion - based text - to - image generation model
Language(s)	English
License	The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
Resources for more information	GitHub Repository, Paper.
Cite as	@misc{zhang2023adding, title={Adding Conditional Control to Text - to - Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Introduction

Controlnet was proposed in Adding Conditional Control to Text - to - Image Diffusion Models by Lvmin Zhang and Maneesh Agrawala.

The abstract reads as follows:

We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task - specific conditions in an end - to - end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine - tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.

Other released checkpoints v1 - 1

The authors released 14 different checkpoints, each trained with Stable Diffusion v1 - 5 on a different type of conditioning:

Model Name	Control Image Overview	Condition Image
lllyasviel/control_v11p_sd15_canny	Trained with canny edge detection	A monochrome image with white edges on a black background.
lllyasviel/control_v11e_sd15_ip2p	Trained with pixel to pixel instruction	No condition .
lllyasviel/control_v11p_sd15_inpaint	Trained with image inpainting	No condition.
lllyasviel/control_v11p_sd15_mlsd	Trained with multi - level line segment detection	An image with annotated line segments.
lllyasviel/control_v11f1p_sd15_depth	Trained with depth estimation	An image with depth information, usually represented as a grayscale image.
lllyasviel/control_v11p_sd15_normalbae	Trained with surface normal estimation	An image with surface normal information, usually represented as a color - coded image.
lllyasviel/control_v11p_sd15_seg	Trained with image segmentation	An image with segmented regions, usually represented as a color - coded image.
lllyasviel/control_v11p_sd15_lineart	Trained with line art generation	An image with line art, usually black lines on a white background.
lllyasviel/control_v11p_sd15s2_lineart_anime	Trained with anime line art generation	An image with anime - style line art.
lllyasviel/control_v11p_sd15_openpose	Trained with human pose estimation	An image with human poses, usually represented as a set of keypoints or skeletons.
lllyasviel/control_v11p_sd15_scribble	Trained with scribble - based image generation	An image with scribbles, usually random or user - drawn strokes.
lllyasviel/control_v11p_sd15_softedge	Trained with soft edge image generation	An image with soft edges, usually to create a more painterly or artistic effect.