Diffusers - Inpainting - Text - Box Open - source Image Generation Model - Freely Create Realistic Images Based on Text Input

Diffusers Inpainting Text Box

Developed by gligen

Stable Diffusion is a latent text-to-image diffusion model capable of generating realistic images from arbitrary text inputs.

Image Generation Open Source License:Openrail #Text-to-Image Generation #High-Resolution Diffusion Model #Art Creation Tool

Downloads 130

Release Time : 3/11/2023

Model Overview

A diffusion-based text-to-image generation model utilizing latent diffusion model architecture, supporting high-quality image generation from text descriptions.

Model Features

High-Quality Image Generation

Capable of generating high-resolution (512x512) realistic images from text inputs

Classifier-Free Guidance Sampling

Utilizes 10% text condition dropout optimization to enhance generation quality

Memory Optimization

Supports attention slicing technology, can run on GPUs with less than 4GB VRAM

Multi-Platform Support

Supports PyTorch and JAX/Flax frameworks, can run on GPU/TPU

Model Capabilities

Text-to-Image Generation

Art Creation

Design Assistance

Creative Visualization

Use Cases

Art Creation

Concept Art Generation

Quickly generate concept art images from text descriptions

Can be used for pre-production concept design in games, films, etc.

Stylized Image Creation

Generate unique images by combining different artistic style prompts

Such as Disney style, cyberpunk style, etc.

Education & Research

Generative Model Research

Explore the limitations and possibilities of generative models

For academic research and experiments

Creative Tool Development

Develop creative assistance tools based on the model

Such as design assistance applications, art creation tools, etc.

🚀 Stable Diffusion v1 - 4 Model Card

Stable Diffusion is a latent text - to - image diffusion model that can generate photo - realistic images based on any text input. For more details about how Stable Diffusion works, refer to 🤗's Stable Diffusion with 🧨Diffusers blog.

The Stable - Diffusion - v1 - 4 checkpoint was initialized with the weights of the [Stable - Diffusion - v1 - 2](https:/steps/huggingface.co/CompVis/stable - diffusion - v1 - 2) checkpoint and then fine - tuned for 225k steps at a resolution of 512x512 on "laion - aesthetics v2 5+" with a 10% drop of text - conditioning to enhance classifier - free guidance sampling.

These weights are designed to be used with the 🧨 Diffusers library. If you need the weights for the CompVis Stable Diffusion codebase, [click here](https://huggingface.co/CompVis/stable - diffusion - v - 1 - 4 - original).

✨ Features

Capable of generating photo - realistic images from text prompts.
Fine - tuned to improve classifier - free guidance sampling.
Can be used with different noise schedulers.
Supports both PyTorch and JAX/Flax for inference.

📦 Installation

We recommend using 🤗's Diffusers library to run Stable Diffusion.

PyTorch

pip install --upgrade diffusers transformers scipy

💻 Usage Examples

Basic Usage

import torch
from diffusers import StableDiffusionPipeline

model_id = "CompVis/stable-diffusion-v1-4"
device = "cuda"

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]  

image.save("astronaut_rides_horse.png")

Advanced Usage

Using a Different Noise Scheduler

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-4"

# Use the Euler scheduler here instead
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]  

image.save("astronaut_rides_horse.png")

JAX/Flax

import jax
import numpy as np
from flax.jax_utils import replicate
from flax.training.common_utils import shard

from diffusers import FlaxStableDiffusionPipeline

pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", revision="flax", dtype=jax.numpy.bfloat16
)

prompt = "a photo of an astronaut riding a horse on mars"

prng_seed = jax.random.PRNGKey(0)
num_inference_steps = 50

num_samples = jax.device_count()
prompt = num_samples * [prompt]
prompt_ids = pipeline.prepare_inputs(prompt)

# shard inputs and rng
params = replicate(params)
prng_seed = jax.random.split(prng_seed, num_samples)
prompt_ids = shard(prompt_ids)

images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))

📚 Documentation

Model Details

Property	Details
Developed by	Robin Rombach, Patrick Esser
Model Type	Diffusion - based text - to - image generation model
Language(s)	English
License	[The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable - diffusion - license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming - convention - of - responsible - ai - licenses), adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the - bigscience - rail - license) on which our license is based.
Model Description	This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses a fixed, pretrained text encoder (CLIP ViT - L/14) as suggested in the Imagen paper.
Resources for more information	[GitHub Repository](https://github.com/CompVis/stable - diffusion), Paper.
Cite as	@InProceedings{Rombach_2022_CVPR, author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj"orn}, title = {High - Resolution Image Synthesis With Latent Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {10684 - 10695} }

Uses

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.
Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.
Research on generative models.

Misuse, Malicious Use, and Out-of-Scope Use

The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.

Limitations and Bias

Limitations

The model does not achieve perfect photorealism.
The model cannot render legible text.
The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”.
Faces and people in general may not be generated properly.
The model was trained mainly with English captions and will not work as well in other languages.
The autoencoding part of the model is lossy.
The model was trained on a large - scale dataset [LAION - 5B](https://laion.ai/blog/laion - 5b/) which contains adult material and is not fit for product use without additional safety mechanisms and considerations.
No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data. The training data can be searched at [https://rom1504.github.io/clip - retrieval/](https://rom1504.github.io/clip - retrieval/) to possibly assist in the detection of memorized images.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases. Stable Diffusion v1 was trained on subsets of [LAION - 2B(en)](https://laion.ai/blog/laion - 5b/), which consists of images that are primarily limited to English descriptions. Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default. Further, the ability of the model to generate content with non - English prompts is significantly worse than with English - language prompts.

Safety Module

The intended use of this model is with the Safety Checker in Diffusers. This checker works by checking model outputs against known hard - coded NSFW concepts. The concepts are intentionally hidden to reduce the likelihood of reverse - engineering this filter. Specifically, the checker compares the class probability of harmful concepts in the embedding space of the CLIPTextModel after generation of the images. The concepts are passed into the model with the generated image and compared to a hand - engineered weight for each NSFW concept.

🔧 Technical Details

The Stable - Diffusion - v1 - 4 checkpoint was initialized with the weights of the [Stable - Diffusion - v1 - 2](https:/steps/huggingface.co/CompVis/stable - diffusion - v1 - 2) checkpoint and subsequently fine - tuned on 225k steps at resolution 512x512 on "laion - aesthetics v2 5+" and 10% dropping of the text - conditioning to improve classifier - free guidance sampling.

📄 License

The model is licensed under [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable - diffusion - license).

⚠️ Important Note

This model is open access and available to all, with a CreativeML OpenRAIL - M license further specifying rights and usage. The CreativeML OpenRAIL License specifies:

You can't use the model to deliberately produce nor share illegal or harmful outputs or content.

The authors claim no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license.

You may re - distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL - M to all your users (please read the license entirely and carefully). Please read the full license carefully here: https://huggingface.co/spaces/CompVis/stable - diffusion - license

💡 Usage Tip

If you are limited by GPU memory and have less than 4GB of GPU RAM available, please make sure to load the StableDiffusionPipeline in float16 precision instead of the default float32 precision. You can do so by telling diffusers to expect the weights to be in float16 precision. If you are limited by TPU memory, please make sure to load the FlaxStableDiffusionPipeline in bfloat16 precision instead of the default float32 precision. You can do so by telling diffusers to load the weights from "bf16" branch.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご