controlnet-scribble-sdxl-1.0开源线稿模型 - 免费生成媲美Midjourney的图像

首页

Controlnet Scribble Sdxl 1.0

由 xinsir 开发

一款全能线稿模型，可生成媲美Midjourney的图像，支持任意线型和线宽

图像生成开源协议:Apache-2.0 #线稿控制生成 #高美学质量 #多线型兼容

下载量 20.34k

发布时间 : 5/12/2024

模型简介

基于海量优质数据训练的超强控制网络，能生成视觉质量媲美Midjourney的高分辨率图像，支持涂鸦线、Canny边缘线、HED边缘线、PIDI边缘线等多种线稿类型

模型特点

高质量图像生成

生成效果可比肩Midjourney，支持高分辨率输出

多线稿类型支持

兼容涂鸦线、Canny边缘线、HED边缘线、PIDI边缘线等多种线稿类型

灵活控制

支持任意线型和线宽，草图可极度简略，提示词无需复杂

强大数据训练

基于超1000万张图像训练，经过严格筛选并由强大VLLM模型标注

模型能力

基于线稿生成高质量图像

支持多种边缘检测算法

保持优秀控制力的同时生成视觉吸引力强的图像

支持从粗到精的绘制流程

使用案例

艺术创作

概念艺术设计

生成《太阳神典》的超现实魔法书等概念艺术作品

采用《奇迹时代》风格与虚幻5引擎渲染效果

人物肖像

生成17岁黑长直少女等精细人物肖像

糅合现实主义与奇幻元素，照片级真实

商业设计

LOGO设计

为第七区彩弹球场设计LOGO

包含鲜艳彩弹元素，视觉冲击力强

矢量图创作

创作《虎胆龙威》中夜间的Nakatomi广场爆炸场景矢量图

🚀 控制网络涂鸦SDXL-1.0模型

这是一款强大的ControlNet模型，能够生成视觉效果可与Midjourney媲美的高分辨率图像。它支持任意类型和宽度的线条，只需简单的草图和提示词，就能生成极具视觉吸引力的图像。

🚀 快速开始

使用以下代码开始使用该模型：

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from controlnet_aux import PidiNetDetector, HEDdetector
from diffusers.utils import load_image
from huggingface_hub import HfApi
from pathlib import Path
from PIL import Image
import torch
import numpy as np
import cv2
import os


def nms(x, t, s):
    x = cv2.GaussianBlur(x.astype(np.float32), (0, 0), s)

    f1 = np.array([[0, 0, 0], [1, 1, 1], [0, 0, 0]], dtype=np.uint8)
    f2 = np.array([[0, 1, 0], [0, 1, 0], [0, 1, 0]], dtype=np.uint8)
    f3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.uint8)
    f4 = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]], dtype=np.uint8)

    y = np.zeros_like(x)

    for f in [f1, f2, f3, f4]:
        np.putmask(y, cv2.dilate(x, kernel=f) == x, x)

    z = np.zeros_like(y, dtype=np.uint8)
    z[y > t] = 255
    return z


controlnet_conditioning_scale = 1.0  
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'


eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")


controlnet = ControlNetModel.from_pretrained(
    "xinsir/controlnet-scribble-sdxl-1.0",
    torch_dtype=torch.float16
)

# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    safety_checker=None,
    torch_dtype=torch.float16,
    scheduler=eulera_scheduler,
)

# you can use either hed to generate a fake scribble given an image or a sketch image totally draw by yourself

if random.random() > 0.5:
  # Method 1 
  # if you use hed, you should provide an image, the image can be real or anime, you extract its hed lines and use it as the scribbles
  # The detail about hed detect you can refer to https://github.com/lllyasviel/ControlNet/blob/main/gradio_fake_scribble2image.py
  # Below is a example using diffusers HED detector

  # image_path = Image.open("your image path, the image can be real or anime, HED detector will extract its edge boundery")
  image_path = cv2.imread("your image path, the image can be real or anime, HED detector will extract its edge boundery")
  processor = HEDdetector.from_pretrained('lllyasviel/Annotators')
  controlnet_img = processor(image_path, scribble=False)
  controlnet_img.save("a hed detect path for an image")

  # following is some processing to simulate human sketch draw, different threshold can generate different width of lines
  controlnet_img = np.array(controlnet_img)
  controlnet_img = nms(controlnet_img, 127, 3)
  controlnet_img = cv2.GaussianBlur(controlnet_img, (0, 0), 3)

  # higher threshold, thiner line
  random_val = int(round(random.uniform(0.01, 0.10), 2) * 255)
  controlnet_img[controlnet_img > random_val] = 255
  controlnet_img[controlnet_img < 255] = 0
  controlnet_img = Image.fromarray(controlnet_img)

else:
  # Method 2
  # if you use a sketch image total draw by yourself
  control_path = "the sketch image you draw with some tools, like drawing board, the path you save it"
  controlnet_img = Image.open(control_path) # Note that the image must be black-white(0 or 255), like the examples we list

# must resize to 1024*1024 or same resolution bucket to get the best performance
width, height  = controlnet_img.size
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = controlnet_img.resize((new_width, new_height))

images = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=controlnet_img,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    width=new_width,
    height=new_height,
    num_inference_steps=30,
    ).images

images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")

✨ 主要特性

强大的图像生成能力：能够生成视觉效果可与Midjourney媲美的高分辨率图像。
广泛的线条支持：支持任意类型和宽度的线条，草图和提示词可以非常简单。
高美学表现：经过大量高质量数据训练，采用了数据增强、多损失和多分辨率等技巧，美学表现优于Controlnet - Canny - Sdxl - 1.0模型。
强控制能力：如果对生成图像的局部区域不满意，绘制更精确的草图并给出详细提示会有很大帮助。
多类型线条支持：支持线稿或Canny线条。

📦 安装指南

文档未提供具体安装步骤，故跳过此章节。

💻 使用示例

基础用法

上述“快速开始”部分的代码即为基础使用示例，可按照代码中的步骤进行操作，使用简单的草图和提示词生成图像。

高级用法

文档未提供高级用法相关代码示例，故跳过此部分。

📚 详细文档

模型详情

模型描述

开发者：xinsir
模型类型：ControlNet_SDXL
许可证：apache - 2.0
微调基础模型：stabilityai/stable-diffusion-xl-base-1.0

模型来源

论文：https://arxiv.org/abs/2302.05543

示例

以下是一些使用该模型生成图像的示例，注意这些示例均使用stabilityai/stable-diffusion-xl-base-1.0和xinsir/controlnet-scribble-sdxl-1.0生成：

提示词：purple feathered eagle with specks of light like stars in feathers. It glows with arcane power
提示词：manga girl in the city, drip marketing
提示词：17 year old girl with long dark hair in the style of realism with fantasy elements, detailed botanical illustrations, barbs and thorns, ethereal, magical, black, purple and maroon, intricate, photorealistic
提示词：a logo for a paintball field named district 7 on a white background featuring paintballs the is bright and colourful eye catching and impactuful
提示词：a photograph of a handsome crying blonde man with his face painted in the pride flag
提示词：simple flat sketch fox play ball
提示词：concept art, a surreal magical Tome of the Sun God, the book binding appears to be made of solar fire and emits a holy, radiant glow, Age of Wonders, Unreal Engine v5
提示词：black Caribbean man walking balance front his fate chaos anarchy liberty independence force energy independence cinematic surreal beautiful rendition intricate sharp detail 8k
提示词：die hard nakatomi plaza, explosion at the top, vector, night scene
提示词：solitary glowing yellow tree in a desert. ultra wide shot. night time. hdr photography

评估数据

测试数据从Midjourney的放大图像中随机采样，并带有提示词。由于项目目的是让人们能够绘制出类似Midjourney的图像，而Midjourney的用户包括大量专业设计师，其放大图像往往具有更高的美观度得分和提示词一致性，因此适合作为测试集来评估ControlNet的能力。随机选择300个提示词 - 图像对，每个提示词生成4张图像，共生成1200张图像。通过计算Laion美学得分来衡量美观度，计算感知相似度来衡量控制能力，发现图像质量与指标值具有良好的一致性。与其他SOTA的Hugging Face模型进行比较，结果如下：

定量结果

指标	xinsir/controlnet-scribble-sdxl-1.0
Laion美学得分	6.03
感知相似度	0.5701

Laion美学得分（越高越好）
感知相似度（越低越好）

注意：这些值是在保存为webp格式时计算的，保存为png格式时美学值会增加0.1 - 0.3，但相对关系保持不变。

结论

在评估中，该模型可以使用简单的草图和提示词生成视觉上有吸引力的图像。该模型支持任意类型和宽度的线条，使用粗线条会给出更粗略的控制，更符合所写的提示词；使用细线条会给出更强的控制，更符合条件图像。该模型在美学得分上高于xinsir/controlnet-canny-sdxl-1.0，但由于粗线条的使用，控制能力会略有下降。