EcomXL_controlnet_inpaint Open-source Text-to-Image Generation Model - Optimized for E-commerce Scenarios to Prevent Foreground Overflow

Ecomxl Controlnet Inpaint

Developed by alimama-creative

A text-to-image diffusion model optimized for e-commerce scenarios, developed based on Stable Diffusion XL, with instance mask fine-tuning to prevent foreground overflow.

Image Generation EnglishOpen Source License:Apache-2.0 #E-commerce image restoration #Instance mask control #SDXL optimization

Downloads 245

Release Time : 5/7/2024

Model Overview

EcomXL includes a series of text-to-image diffusion models optimized for e-commerce scenarios. It regulates the diffusion model through a restoration control network, specifically trained for e-commerce needs, effectively preventing foreground overflow.

Model Features

E-commerce Scenario Optimization

Designed specifically for e-commerce needs, optimizing product display and restoration effects.

Instance Mask Fine-Tuning

Through instance mask fine-tuning, effectively prevents foreground overflow and improves restoration accuracy.

High-Resolution Support

Supports 1024x1024 high-resolution image generation, suitable for high-definition e-commerce display requirements.

Model Capabilities

Text-to-image generation

Image restoration

E-commerce product display optimization

Use Cases

E-commerce

Product Display Restoration

Restores defects or occluded parts in product images to enhance display effects.

The restored images retain product details with no noticeable overflow or distortion.

Background Replacement

Replaces product backgrounds to meet different display scenario requirements.

The background replacement appears natural, with clear and smooth product edges.

🚀 EcomXL Inpaint ControlNet

EcomXL consists of a series of text-to-image diffusion models tailored for e-commerce scenarios. These models are developed based on Stable Diffusion XL. For e-commerce use cases, we've trained an Inpaint ControlNet to regulate the diffusion models. Unlike general inpaint controlnets, this model is fine - tuned with instance masks to avoid foreground outpainting.

🚀 Quick Start

The following sections will guide you through the usage and details of the EcomXL Inpaint ControlNet.

✨ Features

E-commerce Optimization: Specifically designed for e-commerce scenarios, providing more targeted and effective text-to-image generation.
Instance Mask Fine-tuning: Fine-tuned with instance masks to prevent foreground outpainting, ensuring better image quality.

💻 Usage Examples

Basic Usage

from diffusers import (
    ControlNetModel,
    StableDiffusionXLControlNetPipeline,
    DDPMScheduler
)
from diffusers.utils import load_image
import torch
from PIL import Image
import numpy as np

def make_inpaint_condition(init_image, mask_image):
    init_image = np.array(init_image.convert("RGB")).astype(np.float32) / 255.0
    mask_image = np.array(mask_image.convert("L")).astype(np.float32) / 255.0
    assert init_image.shape[0:1] == mask_image.shape[0:1], "image and image_mask must have the same image size"
    init_image[mask_image > 0.5] = -1.0  # set as masked pixel
    init_image = np.expand_dims(init_image, 0).transpose(0, 3, 1, 2)
    init_image = torch.from_numpy(init_image)
    return init_image

def add_fg(full_img, fg_img, mask_img):
    full_img = np.array(full_img).astype(np.float32)
    fg_img = np.array(fg_img).astype(np.float32)
    mask_img = np.array(mask_img).astype(np.float32) / 255.
    full_img = full_img * mask_img + fg_img * (1-mask_img)
    return Image.fromarray(np.clip(full_img, 0, 255).astype(np.uint8))

controlnet = ControlNetModel.from_pretrained(
    "alimama-creative/EcomXL_controlnet_inpaint",
    use_safetensors=True,
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", 
    controlnet=controlnet, 
)
pipe.to("cuda")
pipe.scheduler = DDPMScheduler.from_config(pipe.scheduler.config)

image = load_image(
    "https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint/resolve/main/images/inp_0.png"
)
mask = load_image(
    "https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint/resolve/main/images/inp_1.png"
)
mask = Image.fromarray(255 - np.array(mask))

control_image = make_inpaint_condition(image, mask)

prompt="a product on the table"

generator = torch.Generator(device="cuda").manual_seed(1234)

res_image = pipe(
    prompt,
    image=control_image,
    num_inference_steps=25,
    guidance_scale=7,
    width=1024,
    height=1024,
    controlnet_conditioning_scale=0.5,
    generator=generator,
).images[0]

res_image = add_fg(res_image, image, mask)
res_image.save(f'res.png')

The model exhibits good performance when the controlnet weight (controlnet_condition_scale) is 0.5.

📚 Documentation

Examples

These cases are generated using AUTOMATIC1111/stable-diffusion-webui.

`Foreground`	`Mask`	`w/o instance mask`	`w/ instance mask`

🔧 Technical Details

Training Phases:
- First Phase: The model was trained on 12M laion2B and internal source images with random masks for 20k steps.
- Second Phase: The model was trained on 3M e - commerce images with the instance mask for 20k steps.
Mixed Precision: FP16
Learning Rate: 1e - 4
Batch Size: 2048
Noise Offset: 0.05

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご