๐ stable-diffusion-xl-inpainting-1.0-GGUF
This is a latent text - to - image diffusion model for inpainting, based on StableDiffusion XL and quantized in GGUF format.
๐ Quick Start
!!! Experimental supported by gpustack/llama-box v0.0.98+ only !!!
Model creator: Diffusers
Original model: stable-diffusion-xl-1.0-inpainting-0.1
GGUF quantization: based on stable-diffusion.cpp ac54e that patched by llama-box.
Property |
Details |
Model Type |
Diffusion-based text-to-image generative model |
License |
CreativeML Open RAIL++-M License |
Quantization |
FP16: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE Q8_0: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE Q4_1: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE Q4_0: FP16 for OpenAI CLIP ViT-L/14, OpenCLIP ViT-G/14, and VAE |
๐ป Usage Examples
Basic Usage
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image
import torch
pipe = AutoPipelineForInpainting.from_pretrained("diffusers/stable-diffusion-xl-1.0-inpainting-0.1", torch_dtype=torch.float16, variant="fp16").to("cuda")
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
image = load_image(img_url).resize((1024, 1024))
mask_image = load_image(mask_url).resize((1024, 1024))
prompt = "a tiger sitting on a park bench"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
prompt=prompt,
image=image,
mask_image=mask_image,
guidance_scale=8.0,
num_inference_steps=20,
strength=0.99,
generator=generator,
).images[0]
How it works
image |
mask_image |
 |
 |
prompt |
Output |
a tiger sitting on a park bench |
 |
๐ Documentation
Model Description
This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
Uses
Direct Use
The model is intended for research purposes only. Possible research areas and tasks include:
- Generation of artworks and use in design and other artistic processes.
- Applications in educational or creative tools.
- Research on generative models.
- Safe deployment of models which have the potential to generate harmful content.
- Probing and understanding the limitations and biases of generative models.
Out-of-Scope Use
The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
Limitations and Bias
Limitations
- The model does not achieve perfect photorealism.
- The model cannot render legible text.
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to โA red cube on top of a blue sphereโ.
- Faces and people in general may not be generated properly.
- The autoencoding part of the model is lossy.
- When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version.
Bias
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
๐ License
This model is under the CreativeML Open RAIL++-M License.