SD3-Controlnet-Inpainting Open-source Image Restoration Model - Supports High-resolution Image Restoration and Text Generation

SD3 Controlnet Inpainting

Developed by alimama-creative

ControlNet inpainting model fine-tuned on SD3-medium, supporting high-resolution image inpainting and text generation

Image Generation EnglishOpen Source License:Other #High-resolution image inpainting #Text content generation #Portrait aesthetic optimization

Downloads 209

Release Time : 7/30/2024

Model Overview

This model is a ControlNet inpainting model fine-tuned on SD3-medium, focusing on image inpainting tasks. It maintains the integrity of non-inpainted areas and supports text content generation.

Model Features

High-resolution inpainting

Leveraging SD3's 16-channel VAE and 1024 high-resolution generation capability to perfectly preserve the integrity of non-inpainted areas

Text generation

Supports generating text content through inpainting, which is rare among image inpainting models

Portrait aesthetic performance

Demonstrates excellent aesthetic performance in portrait generation

Advantages over SDXL inpainting model

Shows significant improvements in detail preservation and generation quality compared to SDXL inpainting models

Model Capabilities

Image inpainting

High-resolution image generation

Text content generation

Portrait generation

Use Cases

Image editing

Object replacement

Replace specific objects in an image with other objects

Example shows the effect of replacing a tiger on a park bench with a puppy

Fashion design

Modify clothing styles for characters

Example shows the effect of modifying a woman's dress style

Brand element addition

Add brand logos or text to images

Example shows the effect of adding brand text to hats and buckets

Interior design

Furniture arrangement

Modify indoor furniture arrangements

Example shows the effect of adding an air conditioner to a bedroom wall

🚀 SD3 Controlnet Inpainting Model

This is a finetuned controlnet inpainting model based on SD3. It offers unique capabilities in text - to - image generation and inpainting, and has been integrated into the Diffusers library for convenient use.

🚀 Quick Start

This model has been merged into Diffusers and can now be used conveniently.

✨ Features

Leveraging the SD3 16 - channel VAE and high - resolution generation capability at 1024, the model effectively preserves the integrity of non - inpainting regions, including text.
It is capable of generating text through inpainting.
It demonstrates superior aesthetic performance in portrait generation.

📦 Installation

Install from source and Run

pip uninstall diffusers
pip install git+https://github.com/huggingface/diffusers

💻 Usage Examples

Basic Usage

import torch
from diffusers.utils import load_image, check_min_version
from diffusers.pipelines import StableDiffusion3ControlNetInpaintingPipeline
from diffusers.models.controlnet_sd3 import SD3ControlNetModel

controlnet = SD3ControlNetModel.from_pretrained(
    "alimama-creative/SD3-Controlnet-Inpainting", use_safetensors=True, extra_conditioning_channels=1
)
pipe = StableDiffusion3ControlNetInpaintingPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.text_encoder.to(torch.float16)
pipe.controlnet.to(torch.float16)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/resolve/main/images/dog.png"
)
mask = load_image(
    "https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting/resolve/main/images/dog_mask.png"
)
width = 1024
height = 1024
prompt = "A cat is sitting next to a puppy."
generator = torch.Generator(device="cuda").manual_seed(24)
res_image = pipe(
    negative_prompt="deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, extra limb, missing limb, floating limbs, mutated hands and fingers, disconnected limbs, mutation, mutated, ugly, disgusting, blurry, amputation, NSFW",
    prompt=prompt,
    height=height,
    width=width,
    control_image=image,
    control_mask=mask,
    num_inference_steps=28,
    generator=generator,
    controlnet_conditioning_scale=0.95,
    guidance_scale=7,
).images[0]
res_image.save(f"sd3.png")

🔧 Technical Details

The model was trained on 12M laion2B and internal source images for 20k steps at resolution 1024x1024.

Mixed precision : FP16
Learning rate : 1e - 4
Batch size : 192
Timestep sampling mode : 'logit_normal'
Loss : Flow Matching

📚 Documentation

Examples

SD3

a woman wearing a white jacket, black hat and black pants is standing in a field, the hat writes SD3

bucket_alibaba

a person wearing a white shoe, carrying a white bucket with text "alibaba" on it

SD3 Controlnet Inpainting

Compared with SDXL - Inpainting

From left to right: Input image, Masked image, SDXL inpainting, Ours.

a tiger sitting on a park bench

a dog sitting on a park bench

a young woman wearing a blue and pink floral dress

a woman wearing a white jacket, black hat and black pants is standing in a field, the hat writes SD3

an air conditioner hanging on the bedroom wall

Limitation

Due to the fact that only 1024*1024 pixel resolution was used during the training phase, the inference performs best at this size, with other sizes yielding suboptimal results. We will initiate multi - resolution training in the future, and at that time, we will open - source the new weights.

📄 License

The model is based on SD3 finetuning; therefore, the license follows the original SD3 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご