controlnet-canny-sdxl-1.0開源圖像生成模型 - 借邊緣檢測精準生成高畫質圖

首頁

Controlnet Canny Sdxl 1.0

由xinsir開發

一款強大的控制網絡模型，能夠生成視覺效果媲美Midjourney的高分辨率圖像，通過Canny邊緣檢測實現精準控制。

圖像生成開源協議:Apache-2.0 #高分辨率圖像生成 #Canny邊緣控制 #Midjourney級畫質

下載量 25.79k

發布時間 : 5/10/2024

模型概述

該模型基於Stable Diffusion XL 1.0微調，專注於文本生成圖像任務，特別擅長通過Canny邊緣圖控制生成細節豐富的高質量圖像。

模型特點

高質量生成

通過超過1000萬張精選圖像訓練，生成效果媲美Midjourney級別

精準控制

採用Canny邊緣檢測實現構圖控制，支持複雜場景生成

多風格適配

支持真實照片和動漫風格（需切換基礎模型）

先進訓練技術

採用數據增強、多重損失和多分辨率訓練等技巧優化模型性能

模型能力

基於文本生成圖像

通過邊緣圖控制構圖

高分辨率圖像生成

多風格圖像生成

使用案例

藝術創作

概念藝術設計

根據線稿生成完整藝術概念圖

可生成複雜華麗的藝術構圖（如示例中的亡靈節主題）

插畫創作

將簡單線稿轉化為完整插畫

支持水彩、油畫等多種藝術風格（如示例中的沃特豪斯風格）

商業設計

產品展示

生成產品宣傳圖

可生成專業級美食攝影（如示例中的披薩圖）

廣告設計

快速生成廣告概念圖

支持節日主題等商業場景（如示例中的星星背景圖）

🚀 控制網絡 - 邊緣檢測 - SDXL - 1.0

本模型是一款強大的ControlNet，可生成視覺效果與Midjourney相媲美的高分辨率圖像，推動了穩定擴散模型的應用發展。

images

🚀 快速開始

你可以使用以下代碼來啟動模型：

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler
from PIL import Image
import torch
import numpy as np
import cv2

def HWC3(x):
    assert x.dtype == np.uint8
    if x.ndim == 2:
        x = x[:, :, None]
    assert x.ndim == 3
    H, W, C = x.shape
    assert C == 1 or C == 3 or C == 4
    if C == 3:
        return x
    if C == 1:
        return np.concatenate([x, x, x], axis=2)
    if C == 4:
        color = x[:, :, 0:3].astype(np.float32)
        alpha = x[:, :, 3:4].astype(np.float32) / 255.0
        y = color * alpha + 255.0 * (1.0 - alpha)
        y = y.clip(0, 255).astype(np.uint8)
        return y

controlnet_conditioning_scale = 1.0  
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'

eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")

controlnet = ControlNetModel.from_pretrained(
    "xinsir/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)

# when test with other base model, you need to change the vae also.
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    vae=vae,
    safety_checker=None,
    torch_dtype=torch.float16,
    scheduler=eulera_scheduler,
)

# need to resize the image resolution to 1024 * 1024 or same bucket resolution to get the best performance

controlnet_img = cv2.imread("your image path")
height, width, _  = controlnet_img.shape
ratio = np.sqrt(1024. * 1024. / (width * height))
new_width, new_height = int(width * ratio), int(height * ratio)
controlnet_img = cv2.resize(controlnet_img, (new_width, new_height))

controlnet_img = cv2.Canny(controlnet_img, 100, 200)
controlnet_img = HWC3(controlnet_img)
controlnet_img = Image.fromarray(controlnet_img)

images = pipe(
    prompt,
    negative_prompt=negative_prompt,
    image=controlnet_img,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    width=new_width,
    height=new_height,
    num_inference_steps=30,
    ).images

images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger")

✨ 主要特性

高質量圖像生成：基於大量高質量數據（超過10000000張圖像）訓練，能夠生成高分辨率圖像，視覺效果可與Midjourney相媲美。
訓練技巧豐富：訓練過程中採用了數據增強、多損失和多分辨率等實用技巧，僅經過1階段訓練，性能就超越了其他開源的Canny模型。
應用廣泛：作為ControlNet系列中重要的模型之一，可應用於許多與繪畫和設計相關的工作。

📚 詳細文檔

模型詳情

模型描述

開發者：xinsir
模型類型：ControlNet_SDXL
許可證：apache - 2.0
微調基礎模型（可選）：stabilityai/stable-diffusion-xl-base-1.0

模型來源（可選）

論文（可選）：https://arxiv.org/abs/2302.05543

用途

示例

提示詞：A closeup of two day of the dead models, looking to the side, large flowered headdress, full dia de Los muertoe make up, lush red lips, butterflies, flowers, pastel colors, looking to the side, jungle, birds, color harmony , extremely detailed, intricate, ornate, motion, stunning, beautiful, unique, soft lighting
提示詞：ghost with a plague doctor mask in a venice carnaval hyper realistic
提示詞：A picture surrounded by blue stars and gold stars, glowing, dark navy blue and gray tones, distributed in light silver and gold, playful, festive atmosphere, pure fabric, chalk, FHD 8K
提示詞：Delicious vegetarian pizza with champignon mushrooms, tomatoes, mozzarella, peppers and black olives, isolated on white background , transparent isolated white background , top down view, studio photo, transparent png, Clean sharp focus. High end retouching. Food magazine photography. Award winning photography. Advertising photography. Commercial photography
提示詞：a blonde woman in a wedding dress in a maple forest in summer with a flower crown laurel. Watercolor painting in the style of John William Waterhouse. Romanticism. Ethereal light.

動漫示例（注意：需要將基礎模型更改為CounterfeitXL，其他保持不變）

images_5) images_6) images_7) images_8) images_9)

評估指標

Laion美學評分 [https://laion.ai/blog/laion-aesthetics/]
感知相似度 [https://github.com/richzhang/PerceptualSimilarity]

評估數據

測試數據從Midjourney的放大圖像中隨機抽取，並帶有提示詞。因為該項目的目的是讓人們能夠像Midjourney一樣繪製圖像，而Midjourney的用戶包括大量專業設計師，其放大圖像往往具有更高的美感評分和提示詞一致性，所以適合將其用作測試集來評判ControlNet的能力。我們隨機選擇了300個提示詞 - 圖像對，每個提示詞生成4張圖像，總共生成了1200張圖像。我們計算Laion美學評分來衡量圖像的美感，計算感知相似度來衡量控制能力，發現圖像質量與指標值具有良好的一致性。我們將我們的方法與其他SOTA的Hugging Face模型進行了比較，並將結果列在下面。我們的模型具有最高的美學評分，如果正確使用提示詞，可以生成視覺上吸引人的圖像。

定量結果

指標	xinsir/controlnet-canny-sdxl-1.0	diffusers/controlnet-canny-sdxl-1.0	TheMistoAI/MistoLine
laion_aesthetic	6.03	5.93	5.82
perceptual similarity	0.4200	0.5053	0.5387

laion_aesthetic（越高越好）
perceptual similarity（越低越好）

注意：這些值是在保存為webp格式時計算的，如果保存為png格式，美學值將增加0.1 - 0.3，但相對關係保持不變。

訓練細節

該模型使用高質量數據進行訓練，僅進行了1階段訓練，分辨率設置與sdxl - base相同，為1024 * 1024。我們使用隨機閾值來生成Canny圖像，就像張路明一樣，找到合適的超參數來實現數據增強至關重要，太簡單或太難都會影響模型性能。此外，我們使用隨機掩碼來隨機掩蓋Canny圖像的隨機百分比，以迫使模型學習提示詞和線條之間更多的語義含義。我們使用了超過10000000張經過精心註釋的圖像，事實證明cogvlm是一個強大的圖像描述模型[https://github.com/THUDM/CogVLM?tab=readme-ov-file]。對於漫畫圖像，建議使用waifu tagger來生成特殊標籤[https://huggingface.co/spaces/SmilingWolf/wd-tagger]。訓練模型使用了超過64個A100，使用累積梯度批次時的實際批量大小為2560。

訓練數據

數據來自多個來源，包括Midjourney、laion 5B、danbooru等。數據經過了精心篩選和註釋。

結論

在我們的評估中，與stabilityai/stable-diffusion-xl-base-1.0相比，該模型在真實圖像上獲得了更好的美學評分，在卡通風格圖像上表現相當。由於採用了更強的數據增強和更多的訓練步驟，該模型在使用感知相似度進行測試時，控制能力更好。此外，該模型生成異常圖像（往往包含一些異常人體結構）的概率較低。