ControlNet v1.1 Open-source Image Generation Model - Free Deployment, Precise Control of Image Generation with Line Detection

Control V11p Sd15 Mlsd

Developed by lllyasviel

ControlNet v1.1 is an enhanced diffusion model architecture based on Stable Diffusion, achieving precise image generation control through MLSD (Straight Line Detection) conditional input.

Image Generation OtherOpen Source License:Openrail #Architectural Line Drawing Generation #Interior Design Assistance #Precise Edge Control

Downloads 12.15k

Release Time : 4/14/2023

Model Overview

This model adds MLSD image conditions to control the generation process of the diffusion model, particularly suitable for image generation tasks requiring precise control of linear structures.

Model Features

Precise Line Control

Extracts linear structures from images using the MLSD detector to achieve precise geometric control over generated images.

Enhanced Training Data

Fixed dataset issues from v1.0, added 300K training samples, and applied data augmentation techniques.

Strong Compatibility

Compatible with Stable Diffusion v1.5 and its fine-tuned versions.

Efficient Training

Robust learning even on small datasets (<50K samples), with training speed comparable to fine-tuning diffusion models.

Model Capabilities

Line detection-based image generation

Architectural structure generation

Interior design generation

Geometric shape-controlled generation

Use Cases

Architectural Design

Architectural Floor Plan Generation

Generates architectural renderings based on MLSD-detected linear structures.

Produces architectural images with precise geometric structures.

Interior Design

Luxury Bedroom Design

Generates stylistically consistent interior design layouts based on line detection results.

Example showcases generated royal-style bedroom effects.

🚀 Controlnet - v1.1 - MLSD Version

ControlNet is a neural network structure that enables control of diffusion models by incorporating additional conditions. This specific checkpoint, based on MLSD images, can be used in conjunction with Stable Diffusion to generate diverse images.

🚀 Quick Start

Controlnet v1.1 is the successor to Controlnet v1.0 and was released in lllyasviel/ControlNet-v1-1 by Lvmin Zhang. This checkpoint is a conversion of the original checkpoint into the diffusers format and can be used with Stable Diffusion, such as runwayml/stable-diffusion-v1-5.

For more details, refer to the 🧨 Diffusers docs.

✨ Features

ControlNet is a neural network structure designed to control diffusion models by adding extra conditions. This checkpoint corresponds to the ControlNet conditioned on MLSD images.

📦 Installation

Prerequisites

If you want to process an image to create the auxiliary conditioning, external dependencies are required:

Install https://github.com/patrickvonplaten/controlnet_aux

$ pip install controlnet_aux==0.3.0

Install diffusers and related packages:

$ pip install diffusers transformers accelerate

💻 Usage Examples

Basic Usage

It is recommended to use this checkpoint with Stable Diffusion v1-5 as it has been trained on it. Experimentally, it can also be used with other diffusion models like dreamboothed stable diffusion.

import torch
import os
from huggingface_hub import HfApi
from pathlib import Path
from diffusers.utils import load_image
from PIL import Image
import numpy as np
from controlnet_aux import MLSDdetector

from diffusers import (
    ControlNetModel,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)

checkpoint = "lllyasviel/control_v11p_sd15_mlsd"

image = load_image(
    "https://huggingface.co/lllyasviel/control_v11p_sd15_mlsd/resolve/main/images/input.png"
)

prompt = "royal chamber with fancy bed"

processor = MLSDdetector.from_pretrained('lllyasviel/ControlNet')

control_image = processor(image)
control_image.save("./images/control.png")

controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=30, generator=generator, image=control_image).images[0]

image.save('images/image_out.png')

bird bird_canny bird_canny_out

📚 Documentation

Model Details

Property	Details
Developed by	Lvmin Zhang, Maneesh Agrawala
Model Type	Diffusion-based text-to-image generation model
Language(s)	English
License	The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.
Resources for more information	GitHub Repository, Paper.
Cite as	@misc{zhang2023adding, title={Adding Conditional Control to Text-to-Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Introduction

Controlnet was proposed in Adding Conditional Control to Text-to-Image Diffusion Models by Lvmin Zhang and Maneesh Agrawala.

The abstract reads as follows:

We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.

Other released checkpoints v1-1

The authors released 14 different checkpoints, each trained with Stable Diffusion v1-5 on a different type of conditioning:

Model Name	Control Image Overview	Condition Image
lllyasviel/control_v11p_sd15_canny	Trained with canny edge detection	A monochrome image with white edges on a black background.
lllyasviel/control_v11e_sd15_ip2p	Trained with pixel to pixel instruction	No condition.
lllyasviel/control_v11p_sd15_inpaint	Trained with image inpainting	No condition.
lllyasviel/control_v11p_sd15_mlsd	Trained with multi-level line segment detection	An image with annotated line segments.
lllyasviel/control_v11f1p_sd15_depth	Trained with depth estimation	An image with depth information, usually represented as a grayscale image.
lllyasviel/control_v11p_sd15_normalbae	Trained with surface normal estimation	An image with surface normal information, usually represented as a color-coded image.
lllyasviel/control_v11p_sd15_seg	Trained with image segmentation	An image with segmented regions, usually represented as a color-coded image.
lllyasviel/control_v11p_sd15_lineart	Trained with line art generation	An image with line art, usually black lines on a white background.
lllyasviel/control_v11p_sd15s2_lineart_anime	Trained with anime line art generation	An image with anime-style line art.
lllyasviel/control_v11p_sd15_openpose	Trained with human pose estimation	An image with human poses, usually represented as a set of keypoints or skeletons.
lllyasviel/control_v11p_sd15_scribble	Trained with scribble-based image generation	An image with scribbles, usually random or user-drawn strokes.