🚀 Stable Diffusion Inpainting
Stable Diffusion Inpainting is a latent text-to-image diffusion model. It can generate photo-realistic images from text input and has the ability to inpaint pictures using a mask.
🚀 Quick Start
Stable Diffusion Inpainting can generate high - quality images based on text prompts and perform inpainting operations. You can use it through the 🧨Diffusers library or the RunwayML GitHub repository.
✨ Features
- Text - to - Image Generation: Capable of generating photo - realistic images from any text input.
- Inpainting Function: Can inpaint pictures using a mask.
📦 Installation
Using Diffusers
pip install diffusers
Using RunwayML GitHub Repository
- Download the weights [sd - v1 - 5 - inpainting.ckpt](https://huggingface.co/runwayml/stable - diffusion - inpainting/resolve/main/sd - v1 - 5 - inpainting.ckpt)
- Follow instructions [here](https://github.com/runwayml/stable - diffusion#inpainting - with - stable - diffusion).
💻 Usage Examples
Basic Usage
from diffusers import StableDiffusionInpaintPipeline
import torch
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"runwayml/stable - diffusion - inpainting",
revision="fp16",
torch_dtype=torch.float16,
)
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("./yellow_cat_on_park_bench.png")
Advanced Usage
The advanced usage can be adjusted according to different requirements, such as changing the prompt, adjusting the image and mask, etc.
📚 Documentation
Model Details
Property |
Details |
Developed by |
Robin Rombach, Patrick Esser |
Model Type |
Diffusion - based text - to - image generation model |
Language(s) |
English |
License |
[The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable - diffusion - license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming - convention - of - responsible - ai - licenses), adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the - bigscience - rail - license) on which our license is based. |
Model Description |
This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT - L/14) as suggested in the Imagen paper. |
Resources for more information |
[GitHub Repository](https://github.com/runwayml/stable - diffusion), Paper. |
Cite as |
@InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj"orn}, title = {High - Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684 - 10695} } |
Uses
Direct Use
The model is for research purposes only. Possible research areas and tasks include:
- Safe deployment of models which have the potential to generate harmful content.
- Probing and understanding the limitations and biases of generative models.
- Generation of artworks and use in design and other artistic processes.
- Applications in educational or creative tools.
- Research on generative models.
Misuse, Malicious Use, and Out - of - Scope Use
The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
- Out - of - Scope Use: The model was not trained to be factual or true representations of people or events, so using it to generate such content is out - of - scope.
- Misuse and Malicious Use: Using the model to generate cruel content to individuals is a misuse. This includes generating demeaning, dehumanizing, or otherwise harmful representations; promoting discriminatory content; impersonating individuals without consent; generating sexual content without consent; spreading mis - and disinformation; representing egregious violence and gore; sharing copyrighted or licensed material in violation of its terms of use.
Limitations and Bias
Limitations
- The model does not achieve perfect photorealism.
- The model cannot render legible text.
- The model performs poorly on tasks involving compositionality.
- Faces and people may not be generated properly.
- The model was mainly trained with English captions and works worse in other languages.
- The autoencoding part of the model is lossy.
- The model was trained on a dataset containing adult material and needs additional safety mechanisms for product use.
- There is some degree of memorization due to non - deduplicated training data.
Bias
The model was trained on subsets of [LAION - 2B(en)](https://laion.ai/blog/laion - 5b/), which mainly consists of English - described images. This leads to insufficient representation of non - English communities and cultures, and the model performs worse with non - English prompts.
Training
Training Data
The model developers used the following dataset for training:
- LAION - 2B (en) and subsets thereof.
Training Procedure
Stable Diffusion v1 is a latent diffusion model that combines an autoencoder with a diffusion model. During training:
- Images are encoded into latent representations by an encoder.
- Text prompts are encoded by a ViT - L/14 text - encoder.
- The non - pooled output of the text encoder is fed into the UNet backbone via cross - attention.
- The loss is a reconstruction objective between the added noise and the UNet prediction.
We currently provide six checkpoints, sd - v1 - 1.ckpt
, sd - v1 - 2.ckpt
, sd - v1 - 3.ckpt
, sd - v1 - 4.ckpt
, sd - v1 - 5.ckpt
and sd - v1 - 5 - inpainting.ckpt
, which were trained as described in the original text.
📄 License
This model is under the [CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable - diffusion - license). Before using the model, please read the full license at [https://huggingface.co/spaces/CompVis/stable - diffusion - license](https://huggingface.co/spaces/CompVis/stable - diffusion - license). By accessing the repository, you accept that your contact information can be shared with the model authors.