模型简介
模型特点
模型能力
使用案例
🚀 控制网络涂鸦SDXL-1.0模型
这是一款强大的ControlNet模型,能够生成视觉效果可与Midjourney媲美的高分辨率图像。它支持任意类型和宽度的线条,只需简单的草图和提示词,就能生成极具视觉吸引力的图像。
🚀 快速开始
使用以下代码开始使用该模型:
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from controlnet_aux import PidiNetDetector, HEDdetector
from diffusers.utils import load_image
from huggingface_hub import HfApi
from pathlib import Path
from PIL import Image
import torch
import numpy as np
import cv2
import os
def nms(x, t, s):
x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s)
f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8)
f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8)
f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8)
f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8)
y = np.zeros_like(x)
for f in [f1, f2, f3, f4]:
np.putmask(y, cv2.dilate(x, kernel=f) == x, x)
z = np.zeros_like(y, dtype=np.uint8)
z[y > t] = 255
return z
controlnet_conditioning_scale = 1.0
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")
controlnet = ControlNetModel.from_pretrained(
"xinsir/controlnet-scribble-sdxl-1.0",
torch_dtype=torch.float16
)
# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
vae=vae,
safety_checker=None,
torch_dtype=torch.float16,
scheduler=eulera_scheduler,
)
# you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself
if random.random() > 0.5:
# Method 1
# if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles
# The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradio_fake_scribble2image.py
# Below is a example using diffusers HED detector
# image_path = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery")
image_path = cv2.imread("your image path, the image can be real or anime, HED detector will extract its edge boundery")
processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
controlnet_img = processor(image_path, scribble=False)
controlnet_img.save("a hed detect path for an image")
# following is some processing to simulate human sketch draw, different threshold can generate different width of lines
controlnet_img = np.array(controlnet_img)
controlnet_img = nms(controlnet_img, 127, 3)
controlnet_img = cv2.GaussianBlur(controlnet_img, (0, 0), 3)
# higher threshold, thiner line
random_val = int(round(random.uniform(0.01, 0.10), 2) * 255)
controlnet_img[controlnet_img > random_val] = 255
controlnet_img[controlnet_img < 255] = 0
controlnet_img = Image.fromarray(controlnet_img)
else:
# Method 2
# if you use a sketch image total draw by yourself
control_path = "the sketch image you draw with some tools, like drawing board, the path you save it"
controlnet_img = Image.open(control_path) # Note that the image must be black-white(0 or 255), like the examples we list
# must resize to 1024*1024 or same resolution bucket to get the best performance
width, height = controlnet_img.size
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = controlnet_img.resize((new_width, new_height))
images = pipe(
prompt,
negative_prompt=negative_prompt,
image=controlnet_img,
controlnet_conditioning_scale=controlnet_conditioning_scale,
width=new_width,
height=new_height,
num_inference_steps=30,
).images
images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")
✨ 主要特性
- 强大的图像生成能力:能够生成视觉效果可与Midjourney媲美的高分辨率图像。
- 广泛的线条支持:支持任意类型和宽度的线条,草图和提示词可以非常简单。
- 高美学表现:经过大量高质量数据训练,采用了数据增强、多损失和多分辨率等技巧,美学表现优于Controlnet - Canny - Sdxl - 1.0模型。
- 强控制能力:如果对生成图像的局部区域不满意,绘制更精确的草图并给出详细提示会有很大帮助。
- 多类型线条支持:支持线稿或Canny线条。
📦 安装指南
文档未提供具体安装步骤,故跳过此章节。
💻 使用示例
基础用法
上述“快速开始”部分的代码即为基础使用示例,可按照代码中的步骤进行操作,使用简单的草图和提示词生成图像。
高级用法
文档未提供高级用法相关代码示例,故跳过此部分。
📚 详细文档
模型详情
模型描述
- 开发者:xinsir
- 模型类型:ControlNet_SDXL
- 许可证:apache - 2.0
- 微调基础模型:stabilityai/stable-diffusion-xl-base-1.0
模型来源
- 论文:https://arxiv.org/abs/2302.05543
示例
以下是一些使用该模型生成图像的示例,注意这些示例均使用stabilityai/stable-diffusion-xl-base-1.0和xinsir/controlnet-scribble-sdxl-1.0生成:
- 提示词:purple feathered eagle with specks of light like stars in feathers. It glows with arcane power
- 提示词:manga girl in the city, drip marketing
- 提示词:17 year old girl with long dark hair in the style of realism with fantasy elements, detailed botanical illustrations, barbs and thorns, ethereal, magical, black, purple and maroon, intricate, photorealistic
- 提示词:a logo for a paintball field named district 7 on a white background featuring paintballs the is bright and colourful eye catching and impactuful
- 提示词:a photograph of a handsome crying blonde man with his face painted in the pride flag
- 提示词:simple flat sketch fox play ball
- 提示词:concept art, a surreal magical Tome of the Sun God, the book binding appears to be made of solar fire and emits a holy, radiant glow, Age of Wonders, Unreal Engine v5
- 提示词:black Caribbean man walking balance front his fate chaos anarchy liberty independence force energy independence cinematic surreal beautiful rendition intricate sharp detail 8k
- 提示词:die hard nakatomi plaza, explosion at the top, vector, night scene
- 提示词:solitary glowing yellow tree in a desert. ultra wide shot. night time. hdr photography
评估数据
测试数据从Midjourney的放大图像中随机采样,并带有提示词。由于项目目的是让人们能够绘制出类似Midjourney的图像,而Midjourney的用户包括大量专业设计师,其放大图像往往具有更高的美观度得分和提示词一致性,因此适合作为测试集来评估ControlNet的能力。随机选择300个提示词 - 图像对,每个提示词生成4张图像,共生成1200张图像。通过计算Laion美学得分来衡量美观度,计算感知相似度来衡量控制能力,发现图像质量与指标值具有良好的一致性。与其他SOTA的Hugging Face模型进行比较,结果如下:
定量结果
指标 | xinsir/controlnet-scribble-sdxl-1.0 |
---|---|
Laion美学得分 | 6.03 |
感知相似度 | 0.5701 |
Laion美学得分(越高越好)
感知相似度(越低越好)
注意:这些值是在保存为webp格式时计算的,保存为png格式时美学值会增加0.1 - 0.3,但相对关系保持不变。
结论
在评估中,该模型可以使用简单的草图和提示词生成视觉上有吸引力的图像。该模型支持任意类型和宽度的线条,使用粗线条会给出更粗略的控制,更符合所写的提示词;使用细线条会给出更强的控制,更符合条件图像。该模型在美学得分上高于xinsir/controlnet-canny-sdxl-1.0,但由于粗线条的使用,控制能力会略有下降。
🔧 技术细节
该模型使用大量高质量数据(超过10000000张图像)进行训练,数据经过精心过滤和标注(使用强大的vllm模型)。在训练过程中应用了有用的技巧,包括数据增强、多损失和多分辨率。这些技术使得模型能够在美学表现和控制能力上达到较好的平衡。
📄 许可证
该模型使用的许可证为apache - 2.0。

