controlnet-scribble-sdxl-1.0開源線稿模型 - 免費生成媲美Midjourney的圖像

首頁

Controlnet Scribble Sdxl 1.0

由xinsir開發

一款全能線稿模型，可生成媲美Midjourney的圖像，支持任意線型和線寬

圖像生成開源協議:Apache-2.0 #線稿控制生成 #高美學質量 #多線型兼容

下載量 20.34k

發布時間 : 5/12/2024

模型概述

基於海量優質數據訓練的超強控制網絡，能生成視覺質量媲美Midjourney的高分辨率圖像，支持塗鴉線、Canny邊緣線、HED邊緣線、PIDI邊緣線等多種線稿類型

模型特點

高質量圖像生成

生成效果可比肩Midjourney，支持高分辨率輸出

多線稿類型支持

兼容塗鴉線、Canny邊緣線、HED邊緣線、PIDI邊緣線等多種線稿類型

靈活控制

支持任意線型和線寬，草圖可極度簡略，提示詞無需複雜

強大數據訓練

基於超1000萬張圖像訓練，經過嚴格篩選並由強大VLLM模型標註

模型能力

基於線稿生成高質量圖像

支持多種邊緣檢測算法

保持優秀控制力的同時生成視覺吸引力強的圖像

支持從粗到精的繪製流程

使用案例

藝術創作

概念藝術設計

生成《太陽神典》的超現實魔法書等概念藝術作品

採用《奇蹟時代》風格與虛幻5引擎渲染效果

人物肖像

生成17歲黑長直少女等精細人物肖像

糅合現實主義與奇幻元素，照片級真實

商業設計

LOGO設計

為第七區彩彈球場設計LOGO

包含鮮豔彩彈元素，視覺衝擊力強

矢量圖創作

創作《虎膽龍威》中夜間的Nakatomi廣場爆炸場景矢量圖

🚀 控制網絡塗鴉SDXL-1.0模型

這是一款強大的ControlNet模型，能夠生成視覺效果可與Midjourney媲美的高分辨率圖像。它支持任意類型和寬度的線條，只需簡單的草圖和提示詞，就能生成極具視覺吸引力的圖像。

🚀 快速開始

使用以下代碼開始使用該模型：

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from controlnet_aux import PidiNetDetector, HEDdetector
from diffusers.utils import load_image
from huggingface_hub import HfApi
from pathlib import Path
from PIL import Image
import torch
import numpy as np
import cv2
import os


def nms(x, t, s):
    x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s)

    f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8)
    f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8)
    f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8)
    f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8)

    y = np.zeros_like(x)

    for f in [f1, f2, f3, f4]:
        np.putmask(y, cv2.dilate(x, kernel=f) == x, x)

    z = np.zeros_like(y, dtype=np.uint8)
    z[y > t] = 255
    return z


controlnet_conditioning_scale = 1.0  
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'


eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")


controlnet = ControlNetModel.from_pretrained(
    "xinsir/controlnet-scribble-sdxl-1.0",
    torch_dtype=torch.float16
)

# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    safety_checker=None,
    torch_dtype=torch.float16,
    scheduler=eulera_scheduler,
)

# you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself

if random.random() > 0.5:
  # Method 1 
  # if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles
  # The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradio_fake_scribble2image.py
  # Below is a example using diffusers HED detector

  # image_path = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery")
  image_path = cv2.imread("your image path, the image can be real or anime, HED detector will extract its edge boundery")
  processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
  controlnet_img = processor(image_path, scribble=False)
  controlnet_img.save("a hed detect path for an image")

  # following is some processing to simulate human sketch draw, different threshold can generate different width of lines
  controlnet_img = np.array(controlnet_img)
  controlnet_img = nms(controlnet_img, 127, 3)
  controlnet_img = cv2.GaussianBlur(controlnet_img, (0, 0), 3)

  # higher threshold, thiner line
  random_val = int(round(random.uniform(0.01, 0.10), 2) * 255)
  controlnet_img[controlnet_img > random_val] = 255
  controlnet_img[controlnet_img < 255] = 0
  controlnet_img = Image.fromarray(controlnet_img)

else:
  # Method 2
  # if you use a sketch image total draw by yourself
  control_path = "the sketch image you draw with some tools, like drawing board, the path you save it"
  controlnet_img = Image.open(control_path) # Note that the image must be black-white(0 or 255), like the examples we list

# must resize to 1024*1024 or same resolution bucket to get the best performance
width, height  = controlnet_img.size
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = controlnet_img.resize((new_width, new_height))

images = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=controlnet_img,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    width=new_width,
    height=new_height,
    num_inference_steps=30,
    ).images

images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")

✨ 主要特性

強大的圖像生成能力：能夠生成視覺效果可與Midjourney媲美的高分辨率圖像。
廣泛的線條支持：支持任意類型和寬度的線條，草圖和提示詞可以非常簡單。
高美學表現：經過大量高質量數據訓練，採用了數據增強、多損失和多分辨率等技巧，美學表現優於Controlnet - Canny - Sdxl - 1.0模型。
強控制能力：如果對生成圖像的局部區域不滿意，繪製更精確的草圖並給出詳細提示會有很大幫助。
多類型線條支持：支持線稿或Canny線條。

📦 安裝指南

文檔未提供具體安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

上述“快速開始”部分的代碼即為基礎使用示例，可按照代碼中的步驟進行操作，使用簡單的草圖和提示詞生成圖像。

高級用法

文檔未提供高級用法相關代碼示例，故跳過此部分。

📚 詳細文檔

模型詳情

模型描述

開發者：xinsir
模型類型：ControlNet_SDXL
許可證：apache - 2.0
微調基礎模型：stabilityai/stable-diffusion-xl-base-1.0

模型來源

論文：https://arxiv.org/abs/2302.05543

示例

以下是一些使用該模型生成圖像的示例，注意這些示例均使用stabilityai/stable-diffusion-xl-base-1.0和xinsir/controlnet-scribble-sdxl-1.0生成：

提示詞：purple feathered eagle with specks of light like stars in feathers. It glows with arcane power
提示詞：manga girl in the city, drip marketing
提示詞：17 year old girl with long dark hair in the style of realism with fantasy elements, detailed botanical illustrations, barbs and thorns, ethereal, magical, black, purple and maroon, intricate, photorealistic
提示詞：a logo for a paintball field named district 7 on a white background featuring paintballs the is bright and colourful eye catching and impactuful
提示詞：a photograph of a handsome crying blonde man with his face painted in the pride flag
提示詞：simple flat sketch fox play ball
提示詞：concept art, a surreal magical Tome of the Sun God, the book binding appears to be made of solar fire and emits a holy, radiant glow, Age of Wonders, Unreal Engine v5
提示詞：black Caribbean man walking balance front his fate chaos anarchy liberty independence force energy independence cinematic surreal beautiful rendition intricate sharp detail 8k
提示詞：die hard nakatomi plaza, explosion at the top, vector, night scene
提示詞：solitary glowing yellow tree in a desert. ultra wide shot. night time. hdr photography

評估數據

測試數據從Midjourney的放大圖像中隨機採樣，並帶有提示詞。由於項目目的是讓人們能夠繪製出類似Midjourney的圖像，而Midjourney的用戶包括大量專業設計師，其放大圖像往往具有更高的美觀度得分和提示詞一致性，因此適合作為測試集來評估ControlNet的能力。隨機選擇300個提示詞 - 圖像對，每個提示詞生成4張圖像，共生成1200張圖像。通過計算Laion美學得分來衡量美觀度，計算感知相似度來衡量控制能力，發現圖像質量與指標值具有良好的一致性。與其他SOTA的Hugging Face模型進行比較，結果如下：

定量結果

指標	xinsir/controlnet-scribble-sdxl-1.0
Laion美學得分	6.03
感知相似度	0.5701

Laion美學得分（越高越好）
感知相似度（越低越好）

注意：這些值是在保存為webp格式時計算的，保存為png格式時美學值會增加0.1 - 0.3，但相對關係保持不變。

結論

在評估中，該模型可以使用簡單的草圖和提示詞生成視覺上有吸引力的圖像。該模型支持任意類型和寬度的線條，使用粗線條會給出更粗略的控制，更符合所寫的提示詞；使用細線條會給出更強的控制，更符合條件圖像。該模型在美學得分上高於xinsir/controlnet-canny-sdxl-1.0，但由於粗線條的使用，控制能力會略有下降。