Stable Diffusion XL Inpainting 1.0-GGUF Open-source Model: Achieve Text-to-image Generation and Image Restoration for Free

Stable Diffusion Xl Inpainting 1.0 GGUF

Developed by gpustack

A text-to-image generation model based on Stable Diffusion XL with inpainting capabilities, able to modify image content based on text input and masks.

Image Generation #High-resolution image inpainting #Text-guided editing #1024x1024 generation

Downloads 387

Release Time : 12/24/2024

Model Overview

This is a latent text-to-image diffusion model capable of generating realistic images from any text input, with the additional functionality of inpainting images using masks.

Model Features

High-resolution image generation

Supports 1024x1024 high-resolution image generation

Image inpainting capability

Can repair specific areas of an image using masks

Dual text encoder

Uses OpenCLIP-ViT/G and CLIP-ViT/L dual text encoders to enhance text understanding

Quantization support

Provides multiple quantization versions (FP16, Q8_0, Q4_1, Q4_0) to accommodate different hardware requirements

Model Capabilities

Text-to-image generation

Image inpainting

High-resolution image generation

Mask-based content editing

Use Cases

Creative design

Art creation

Generate artworks based on text descriptions

Produces high-quality images matching the description

Image restoration

Repair damaged or unwanted parts of an image

Seamlessly restores specified areas of an image

Educational tools

Visual teaching aids

Quickly generate visual materials for teaching

Generates relevant images based on teaching content

🚀 stable-diffusion-xl-inpainting-1.0-GGUF

This is a latent text - to - image diffusion model for inpainting, based on StableDiffusion XL and quantized in GGUF format.

🚀 Quick Start

!!! Experimental supported by gpustack/llama-box v0.0.98+ only !!!

Model creator: Diffusers
Original model: stable-diffusion-xl-1.0-inpainting-0.1
GGUF quantization: based on stable-diffusion.cpp ac54e that patched by llama-box.

Property	Details
Model Type	Diffusion-based text-to-image generative model
License	CreativeML Open RAIL++-M License
Quantization	FP16: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE Q8_0: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE Q4_1: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE Q4_0: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE

💻 Usage Examples

Basic Usage

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForInpainting.from_pretrained("diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16").to("cuda")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

image = load_image(img_url).resize((1024, 1024))
mask_image = load_image(mask_url).resize((1024, 1024))

prompt = "a tiger sitting on a park bench"
generator = torch.Generator(device="cuda").manual_seed(0)

image = pipe(
  prompt=prompt,
  image=image,
  mask_image=mask_image,
  guidance_scale=8.0,
  num_inference_steps=20,  # steps between 15 and 30 work well for us
  strength=0.99,  # make sure to use `strength` below 1.0
  generator=generator,
).images[0]

How it works

`image`	`mask_image`

`prompt`	`Output`
a tiger sitting on a park bench

📚 Documentation

Model Description

This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism.
The model cannot render legible text.
The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”.
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.
When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

📄 License

This model is under the CreativeML Open RAIL++-M License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご