controlnet-inpaint-endpoint開源模型 - 基於Stable Diffusion實現圖像修復條件控制

首頁

Controlnet Inpaint Endpoint

由OrderAndChaos開發

ControlNet v1.1 是一個基於 Stable Diffusion 的神經網絡結構，用於通過圖像修復條件控制擴散模型。

圖像生成其他開源協議:Openrail #圖像修復 #ControlNet控制 #Stable Diffusion集成

下載量 56

發布時間 : 5/23/2023

模型概述

該模型是 ControlNet 的圖像修復版本，可以與 Stable Diffusion 結合使用，通過添加圖像修復條件來控制圖像生成過程。

模型特點

圖像修復控制

能夠根據給定的圖像和掩碼，精確控制圖像修復的區域和內容。

與 Stable Diffusion 兼容

專為與 Stable Diffusion v1-5 配合使用而設計，確保高質量的圖像生成效果。

端到端學習

通過端到端的方式學習特定任務的條件，即使在小數據集上也能穩健訓練。

模型能力

圖像修復

圖像生成

條件控制生成

使用案例

藝術創作

修復損壞的藝術品

使用該模型修復老照片或損壞的藝術品，恢復其原始細節。

高質量的修復圖像，保留原始風格和細節。

設計

產品設計修改

在設計過程中快速修改產品圖像的特定部分，無需重新繪製。

無縫修改的設計圖像，保持整體一致性。

🚀 Controlnet - v1.1 - InPaint Version

Controlnet v1.1 是一種用於控制擴散模型的神經網絡結構，可通過添加額外條件來增強擴散模型的功能。本版本為 InPaint 版本，可與 Stable Diffusion 結合使用，實現圖像修復等功能。

🚀 快速開始

環境準備

首先，你需要安裝 diffusers 及相關依賴包：

pip install diffusers transformers accelerate

代碼示例

以下是一個使用本模型進行圖像修復的示例代碼：

import torch
import os
from diffusers.utils import load_image
from PIL import Image
import numpy as np
from diffusers import (
    ControlNetModel,
    StableDiffusionControlNetPipeline,
    UniPCMultistepScheduler,
)
checkpoint = "lllyasviel/control_v11p_sd15_inpaint"
original_image = load_image(
    "https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/original.png"
)
mask_image = load_image(
    "https://huggingface.co/lllyasviel/control_v11p_sd15_inpaint/resolve/main/images/mask.png"
)

def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L"))
    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask < 128] = -1.0 # set as masked pixel 
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image

control_image = make_inpaint_condition(original_image, mask_image)
prompt = "best quality"
negative_prompt="lowres, bad anatomy, bad hands, cropped, worst quality"
controlnet = ControlNetModel.from_pretrained(checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
generator = torch.manual_seed(2)
image = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=30, 
             generator=generator, image=control_image).images[0]
image.save('images/output.png')

示例圖片

original mask inpaint_output

✨ 主要特性

靈活控制：通過添加額外條件，可靈活控制擴散模型的生成結果。
與 Stable Diffusion 兼容：可與 Stable Diffusion 模型（如 runwayml/stable-diffusion-v1-5）結合使用。
多種應用場景：適用於圖像修復、風格轉換等多種圖像生成任務。

📦 安裝指南

安裝 diffusers 及相關依賴包：

pip install diffusers transformers accelerate

💻 使用示例

基礎用法

import base64
import requests

HF_TOKEN = 'hf_xxxxxxxxxxxxx'
API_ENDPOINT = 'https://xxxxxxxxxxx.us-east-1.aws.endpoints.huggingface.cloud'

def load_image(path):
    try:
        with open(path, 'rb') as file:
            return file.read()
    except FileNotFoundError as error:
        print('Error reading image:', error)


def get_b64_image(path):
    image_buffer = load_image(path)
    if image_buffer:
        return base64.b64encode(image_buffer).decode('utf-8')


def process_images(original_image_path, mask_image_path, result_path, prompt, width, height):
    original_b64 = get_b64_image(original_image_path)
    mask_b64 = get_b64_image(mask_image_path)

    if not original_b64 or not mask_b64:
        return

    body = {
        'inputs': prompt,
        'image': original_b64,
        'mask_image': mask_b64,
        'width': width,
        'height': height
    }

    headers = {
        'Authorization': f'Bearer {HF_TOKEN}',
        'Content-Type': 'application/json',
        'Accept': 'image/png'
    }

    response = requests.post(
        API_ENDPOINT,
        json=body,
        headers=headers
    )
    blob = response.content

    save_image(blob, result_path)


def save_image(blob, file_path):
    with open(file_path, 'wb') as file:
        file.write(blob)
    print('File saved successfully!')


if __name__ == '__main__':
    original_image_path = 'images/original.png'
    mask_image_path = 'images/mask.png'
    result_path = 'images/result.png'
    process_images(original_image_path, mask_image_path, result_path, 'cyberpunk mona lisa', 512, 768)

高級用法

本模型可與其他擴散模型結合使用，如 dreamboothed stable diffusion，但建議與 Stable Diffusion v1-5 搭配使用，因為該模型是基於此進行訓練的。

📚 詳細文檔

模型介紹

ControlNet v1.1 由 Lvmin Zhang 在 lllyasviel/ControlNet-v1-1 中發佈。本檢查點是將原始檢查點轉換為 diffusers 格式，可與 Stable Diffusion 結合使用。

模型原理

ControlNet 是一種神經網絡結構，通過添加額外條件來控制擴散模型。其原理是在擴散模型的基礎上，引入額外的條件信息，從而實現對生成結果的精確控制。

🔧 技術細節

模型信息

屬性	詳情
模型類型	基於擴散的文本到圖像生成模型
訓練數據	未提及
開發者	Lvmin Zhang, Maneesh Agrawala
語言	英文
許可證	The CreativeML OpenRAIL M license 是一種 Open RAIL M license，改編自 BigScience 和 the RAIL Initiative 在負責任的 AI 許可領域的聯合工作。
更多資源	GitHub 倉庫，論文

引用信息

@misc{zhang2023adding,
    title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
    author={Lvmin Zhang and Maneesh Agrawala},
    year={2023},
    eprint={2302.05543},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

論文摘要

本文提出了一種神經網絡結構 ControlNet，用於控制預訓練的大型擴散模型，以支持額外的輸入條件。ControlNet 可以端到端地學習特定任務的條件，即使訓練數據集較小（< 50k），學習過程也很穩健。此外，訓練 ControlNet 的速度與微調擴散模型相當，並且可以在個人設備上進行訓練。如果有強大的計算集群，模型可以擴展到大量（數百萬到數十億）的數據。實驗表明，像 Stable Diffusion 這樣的大型擴散模型可以通過 ControlNet 進行增強，以支持邊緣圖、分割圖、關鍵點等條件輸入。這可能會豐富控制大型擴散模型的方法，並進一步促進相關應用的發展。

📄 許可證

本模型使用 The CreativeML OpenRAIL M license 許可證。

其他已發佈的檢查點 v1-1

作者發佈了 14 種不同的檢查點，每種檢查點都基於 Stable Diffusion v1-5 在不同類型的條件下進行訓練：

模型名稱	控制圖像概述	條件圖像
lllyasviel/control_v11p_sd15_canny	使用 Canny 邊緣檢測進行訓練	黑色背景上帶有白色邊緣的單色圖像。
lllyasviel/control_v11e_sd15_ip2p	使用像素到像素指令進行訓練	無條件。
lllyasviel/control_v11p_sd15_inpaint	使用圖像修復進行訓練	無條件。
lllyasviel/control_v11p_sd15_mlsd	使用多級線段檢測進行訓練	帶有註釋線段的圖像。
lllyasviel/control_v11f1p_sd15_depth	使用深度估計進行訓練	帶有深度信息的圖像，通常表示為灰度圖像。
lllyasviel/control_v11p_sd15_normalbae	使用表面法線估計進行訓練	帶有表面法線信息的圖像，通常表示為彩色編碼圖像。
lllyasviel/control_v11p_sd15_seg	使用圖像分割進行訓練	帶有分割區域的圖像，通常表示為彩色編碼圖像。
lllyasviel/control_v11p_sd15_lineart	使用線稿生成進行訓練	帶有線稿的圖像，通常是白色背景上的黑色線條。
lllyasviel/control_v11p_sd15s2_lineart_anime	使用動漫線稿生成進行訓練	帶有動漫風格線稿的圖像。
lllyasviel/control_v11p_sd15_openpose	使用人體姿態估計進行訓練	帶有人體姿態的圖像，通常表示為一組關鍵點或骨架。
lllyasviel/control_v11p_sd15_scribble	使用基於塗鴉的圖像生成進行訓練	帶有塗鴉的圖像，通常是隨機或用戶繪製的筆觸。
lllyasviel/control_v11p_sd15_softedge	使用軟邊緣圖像生成進行訓練	帶有軟邊緣的圖像，通常用於創建更具繪畫風格或藝術效果的圖像。
lllyasviel/control_v11e_sd15_shuffle	使用圖像洗牌進行訓練	帶有洗牌補丁或區域的圖像。
lllyasviel/control_v11f1e_sd15_tile	使用圖像平鋪進行訓練	模糊圖像或圖像的一部分。