SDXL - Turbo Open-Source Image Generation Model: Generate Realistic Images from Text Rapidly in a Single Evaluation

Sdxl Turbo

Developed by stabilityai

SDXL-Turbo is a fast generative text-to-image model capable of producing realistic images from text prompts through a single network evaluation.

Text-to-Image Open Source License:Other #Real-time image generation #Single-step inference #Text-to-image

Downloads 304.13k

Release Time : 11/27/2023

Model Overview

SDXL-Turbo is a distilled version of SDXL 1.0, specifically trained for real-time synthesis using the novel Adversarial Diffusion Distillation (ADD) training method, enabling high-quality image generation in 1 to 4 steps.

Model Features

Real-time image generation

Capable of generating high-quality images in 1 to 4 steps, suitable for real-time applications.

Adversarial Diffusion Distillation technology

Utilizes the innovative ADD training method, combining score distillation and adversarial loss to ensure high image fidelity with low-step sampling.

High-quality output

Outperforms similar single-step and multi-step models in image quality and prompt adherence.

Model Capabilities

Text-to-image generation

Image-to-image transformation

Real-time image synthesis

Use Cases

Creative design

Artwork generation

Used for generating artworks, conceptual designs, and creative expressions.

Can produce images with artistic styles

Educational tool

Serves as an educational or creative tool to help visualize concepts and ideas.

Quickly generates teaching image materials

Research fields

Generative model research

Used for studying the techniques and applications of real-time generative models.

Model safety research

Investigates the safe deployment and potential risks of generative models.

🚀 SDXL-Turbo Model Card

SDXL-Turbo is a fast generative text-to-image model. It can synthesize photorealistic images from a text prompt in a single network evaluation, offering a real - time image generation solution. A real - time demo is available at http://clipdrop.co/stable-diffusion-turbo.

⚠️ Important Note

For commercial use, please refer to https://stability.ai/license.

🚀 Quick Start

Check out https://github.com/Stability-AI/generative-models to get started with the SDXL-Turbo model.

✨ Features

Fast image generation: Synthesize images from text prompts in a single network evaluation.
High - quality output: Capable of generating photorealistic images.
Novel training method: Based on Adversarial Diffusion Distillation (ADD) for high - quality low - step sampling.

📦 Installation

pip install diffusers transformers accelerate --upgrade

💻 Usage Examples

Basic Usage

Text - to - image

SDXL-Turbo does not make use of guidance_scale or negative_prompt, we disable it with guidance_scale = 0.0. Preferably, the model generates images of size 512x512 but higher image sizes work as well. A single step is enough to generate high quality images.

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

Image - to - image

When using SDXL-Turbo for image - to - image generation, make sure that num_inference_steps * strength is larger or equal to 1. The image - to - image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in our example below.

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]

📚 Documentation

Model Details

Model Description

SDXL-Turbo is a distilled version of SDXL 1.0, trained for real - time synthesis. It is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling large - scale foundational image diffusion models in 1 to 4 steps at high image quality. This approach uses score distillation to leverage large - scale off - the - shelf image diffusion models as a teacher signal and combines this with an adversarial loss to ensure high image fidelity even in the low - step regime of one or two sampling steps.

Property	Details
Developed by	Stability AI
Funded by	Stability AI
Model Type	Generative text - to - image model
Finetuned from model	SDXL 1.0 Base

Model Sources

For research purposes, we recommend our generative-models Github repository (https://github.com/Stability-AI/generative-models), which implements the most popular diffusion frameworks (both training and inference).

Property	Details
Repository	https://github.com/Stability-AI/generative-models
Paper	https://stability.ai/research/adversarial-diffusion-distillation
Demo	http://clipdrop.co/stable-diffusion-turbo

Evaluation

comparison1 comparison2 The charts above evaluate user preference for SDXL-Turbo over other single - and multi - step models. SDXL-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM - XL evaluated at four (or fewer) steps. In addition, we see that using four steps for SDXL-Turbo further improves performance. For details on the user study, we refer to the research paper.

Uses

Direct Use

The model is intended for both non - commercial and commercial usage. You can use this model for non - commercial or research purposes under this license. Possible research areas and tasks include:

Research on generative models.
Research on real - time applications of generative models.
Research on the impact of real - time generative models.
Safe deployment of models which have the potential to generate harmful content.
Probing and understanding the limitations and biases of generative models.
Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.

For commercial use, please refer to https://stability.ai/membership.

Out - of - Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out - of - scope for the abilities of this model. The model should not be used in any way that violates Stability AI's [Acceptable Use Policy](https://stability.ai/use - policy).

Limitations and Bias

Limitations

The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.
The model cannot render legible text.
Faces and people in general may not be generated properly.
The autoencoding part of the model is lossy.

Recommendations

The model is intended for both non - commercial and commercial usage.

📄 License

The model uses the other license, named sai - nc - community. For more details, please refer to the license link: https://huggingface.co/stabilityai/sdxl-turbo/blob/main/LICENSE.md

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご