模型概述
模型特點
模型能力
使用案例
🚀 控制網絡塗鴉SDXL-1.0模型
這是一款強大的ControlNet模型,能夠生成視覺效果可與Midjourney媲美的高分辨率圖像。它支持任意類型和寬度的線條,只需簡單的草圖和提示詞,就能生成極具視覺吸引力的圖像。
🚀 快速開始
使用以下代碼開始使用該模型:
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from controlnet_aux import PidiNetDetector, HEDdetector
from diffusers.utils import load_image
from huggingface_hub import HfApi
from pathlib import Path
from PIL import Image
import torch
import numpy as np
import cv2
import os
def nms(x, t, s):
x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s)
f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8)
f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8)
f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8)
f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8)
y = np.zeros_like(x)
for f in [f1, f2, f3, f4]:
np.putmask(y, cv2.dilate(x, kernel=f) == x, x)
z = np.zeros_like(y, dtype=np.uint8)
z[y > t] = 255
return z
controlnet_conditioning_scale = 1.0
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")
controlnet = ControlNetModel.from_pretrained(
"xinsir/controlnet-scribble-sdxl-1.0",
torch_dtype=torch.float16
)
# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
vae=vae,
safety_checker=None,
torch_dtype=torch.float16,
scheduler=eulera_scheduler,
)
# you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself
if random.random() > 0.5:
# Method 1
# if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles
# The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradio_fake_scribble2image.py
# Below is a example using diffusers HED detector
# image_path = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery")
image_path = cv2.imread("your image path, the image can be real or anime, HED detector will extract its edge boundery")
processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
controlnet_img = processor(image_path, scribble=False)
controlnet_img.save("a hed detect path for an image")
# following is some processing to simulate human sketch draw, different threshold can generate different width of lines
controlnet_img = np.array(controlnet_img)
controlnet_img = nms(controlnet_img, 127, 3)
controlnet_img = cv2.GaussianBlur(controlnet_img, (0, 0), 3)
# higher threshold, thiner line
random_val = int(round(random.uniform(0.01, 0.10), 2) * 255)
controlnet_img[controlnet_img > random_val] = 255
controlnet_img[controlnet_img < 255] = 0
controlnet_img = Image.fromarray(controlnet_img)
else:
# Method 2
# if you use a sketch image total draw by yourself
control_path = "the sketch image you draw with some tools, like drawing board, the path you save it"
controlnet_img = Image.open(control_path) # Note that the image must be black-white(0 or 255), like the examples we list
# must resize to 1024*1024 or same resolution bucket to get the best performance
width, height = controlnet_img.size
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = controlnet_img.resize((new_width, new_height))
images = pipe(
prompt,
negative_prompt=negative_prompt,
image=controlnet_img,
controlnet_conditioning_scale=controlnet_conditioning_scale,
width=new_width,
height=new_height,
num_inference_steps=30,
).images
images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")
✨ 主要特性
- 強大的圖像生成能力:能夠生成視覺效果可與Midjourney媲美的高分辨率圖像。
- 廣泛的線條支持:支持任意類型和寬度的線條,草圖和提示詞可以非常簡單。
- 高美學表現:經過大量高質量數據訓練,採用了數據增強、多損失和多分辨率等技巧,美學表現優於Controlnet - Canny - Sdxl - 1.0模型。
- 強控制能力:如果對生成圖像的局部區域不滿意,繪製更精確的草圖並給出詳細提示會有很大幫助。
- 多類型線條支持:支持線稿或Canny線條。
📦 安裝指南
文檔未提供具體安裝步驟,故跳過此章節。
💻 使用示例
基礎用法
上述“快速開始”部分的代碼即為基礎使用示例,可按照代碼中的步驟進行操作,使用簡單的草圖和提示詞生成圖像。
高級用法
文檔未提供高級用法相關代碼示例,故跳過此部分。
📚 詳細文檔
模型詳情
模型描述
- 開發者:xinsir
- 模型類型:ControlNet_SDXL
- 許可證:apache - 2.0
- 微調基礎模型:stabilityai/stable-diffusion-xl-base-1.0
模型來源
- 論文:https://arxiv.org/abs/2302.05543
示例
以下是一些使用該模型生成圖像的示例,注意這些示例均使用stabilityai/stable-diffusion-xl-base-1.0和xinsir/controlnet-scribble-sdxl-1.0生成:
- 提示詞:purple feathered eagle with specks of light like stars in feathers. It glows with arcane power
- 提示詞:manga girl in the city, drip marketing
- 提示詞:17 year old girl with long dark hair in the style of realism with fantasy elements, detailed botanical illustrations, barbs and thorns, ethereal, magical, black, purple and maroon, intricate, photorealistic
- 提示詞:a logo for a paintball field named district 7 on a white background featuring paintballs the is bright and colourful eye catching and impactuful
- 提示詞:a photograph of a handsome crying blonde man with his face painted in the pride flag
- 提示詞:simple flat sketch fox play ball
- 提示詞:concept art, a surreal magical Tome of the Sun God, the book binding appears to be made of solar fire and emits a holy, radiant glow, Age of Wonders, Unreal Engine v5
- 提示詞:black Caribbean man walking balance front his fate chaos anarchy liberty independence force energy independence cinematic surreal beautiful rendition intricate sharp detail 8k
- 提示詞:die hard nakatomi plaza, explosion at the top, vector, night scene
- 提示詞:solitary glowing yellow tree in a desert. ultra wide shot. night time. hdr photography
評估數據
測試數據從Midjourney的放大圖像中隨機採樣,並帶有提示詞。由於項目目的是讓人們能夠繪製出類似Midjourney的圖像,而Midjourney的用戶包括大量專業設計師,其放大圖像往往具有更高的美觀度得分和提示詞一致性,因此適合作為測試集來評估ControlNet的能力。隨機選擇300個提示詞 - 圖像對,每個提示詞生成4張圖像,共生成1200張圖像。通過計算Laion美學得分來衡量美觀度,計算感知相似度來衡量控制能力,發現圖像質量與指標值具有良好的一致性。與其他SOTA的Hugging Face模型進行比較,結果如下:
定量結果
指標 | xinsir/controlnet-scribble-sdxl-1.0 |
---|---|
Laion美學得分 | 6.03 |
感知相似度 | 0.5701 |
Laion美學得分(越高越好)
感知相似度(越低越好)
注意:這些值是在保存為webp格式時計算的,保存為png格式時美學值會增加0.1 - 0.3,但相對關係保持不變。
結論
在評估中,該模型可以使用簡單的草圖和提示詞生成視覺上有吸引力的圖像。該模型支持任意類型和寬度的線條,使用粗線條會給出更粗略的控制,更符合所寫的提示詞;使用細線條會給出更強的控制,更符合條件圖像。該模型在美學得分上高於xinsir/controlnet-canny-sdxl-1.0,但由於粗線條的使用,控制能力會略有下降。
🔧 技術細節
該模型使用大量高質量數據(超過10000000張圖像)進行訓練,數據經過精心過濾和標註(使用強大的vllm模型)。在訓練過程中應用了有用的技巧,包括數據增強、多損失和多分辨率。這些技術使得模型能夠在美學表現和控制能力上達到較好的平衡。
📄 許可證
該模型使用的許可證為apache - 2.0。

