SD-Turbo Open-Source Image Generation Model - Quickly Convert Text into Lifelike Images in a Single Inference

Sd Turbo

Developed by stabilityai

SD-Turbo is a high-speed text-to-image model capable of generating realistic images from text prompts with just a single network inference. Released as a research prototype, it aims to explore compact distilled text-to-image models.

Image Generation #Real-time image generation #Single-step inference #Adversarial distillation

Downloads 502.82k

Release Time : 11/27/2023

Model Overview

SD-Turbo is a distilled version of Stable Diffusion 2.1, optimized for real-time generation. Its core technology is the Adversarial Diffusion Distillation (ADD) training method, enabling high-quality image generation in 1-4 steps.

Model Features

High-speed generation

Generates images with just a single network inference, suitable for real-time applications

Adversarial Diffusion Distillation

Uses ADD training method combining score distillation and adversarial loss to ensure image quality with low-step sampling

Lightweight model

As a compact distilled model, it is more efficient compared to the original version

Model Capabilities

Text-to-image

Image-guided generation

Use Cases

Creative design

Art creation assistance

Quickly generates conceptual images for creative purposes

Can produce 512x512 pixel images in 1-4 steps

Educational tool development

Used for developing creative educational tools

Research

Generative model research

Explores compact distilled text-to-image models

Real-time generation application exploration

Investigates the impact and applications of real-time generation models

🚀 SD-Turbo Model Card

SD-Turbo is a fast generative text-to-image model. It can synthesize photorealistic images from a text prompt in a single network evaluation. This model is released as a research artifact to study small, distilled text-to-image models. For better quality and prompt understanding, SDXL-Turbo is recommended.

Please note that for commercial use, refer to https://stability.ai/license.

🚀 Quick Start

Check out https://github.com/Stability-AI/generative-models

✨ Features

Fast image synthesis: Can generate images from text prompts in a single network evaluation.
Based on novel training method: Uses Adversarial Diffusion Distillation (ADD) for high-quality image sampling in 1 - 4 steps.

📦 Installation

pip install diffusers transformers accelerate --upgrade

💻 Usage Examples

Basic Usage

Text-to-image

SD-Turbo does not use guidance_scale or negative_prompt, and we disable it with guidance_scale=0.0. Preferably, the model generates images of size 512x512, but higher image sizes also work. A single step is enough to generate high-quality images.

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."
image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

Image-to-image

When using SD-Turbo for image-to-image generation, ensure that num_inference_steps * strength is greater than or equal to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g., 0.5 * 2.0 = 1 step in the following example.

from diffusers import AutoPipelineForImage2Image
from diffusers.utils import load_image
import torch

pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sd-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png").resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]

📚 Documentation

Model Details

Model Description

SD-Turbo is a distilled version of Stable Diffusion 2.1, trained for real-time synthesis. It is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the technical report), which allows sampling large-scale foundational image diffusion models in 1 to 4 steps with high image quality. This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines it with an adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps.

Property	Details
Developed by	Stability AI
Funded by	Stability AI
Model Type	Generative text-to-image model
Finetuned from model	Stable Diffusion 2.1

Model Sources

For research purposes, we recommend our generative-models Github repository (https://github.com/Stability-AI/generative-models), which implements the most popular diffusion frameworks (both training and inference).

Repository: https://github.com/Stability-AI/generative-models
Paper: https://stability.ai/research/adversarial-diffusion-distillation
Demo [for the bigger SDXL-Turbo]: http://clipdrop.co/stable-diffusion-turbo

Evaluation

comparison1 comparison2 The above charts evaluate user preference for SD-Turbo over other single- and multi-step models. SD-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-Lora XL and LCM-Lora 1.5.

Note: For better quality, the bigger version SDXL-Turbo is recommended. For details on the user study, refer to the research paper.

Uses

Direct Use

The model can be used for both non-commercial and commercial purposes. Possible research areas and tasks include:

Research on generative models.
Research on real-time applications of generative models.
Research on the impact of real-time generative models.
Safe deployment of models that may generate harmful content.
Probing and understanding the limitations and biases of generative models.
Generation of artworks and use in design and other artistic processes.
Applications in educational or creative tools.

For commercial use, refer to https://stability.ai/membership. Excluded uses are described below.

Out-of-Scope Use

The model was not trained to accurately represent people or events. Therefore, using the model to generate such content is beyond its capabilities. The model should not be used in any way that violates Stability AI's Acceptable Use Policy.

Limitations and Bias

Limitations

The quality and prompt alignment are lower than those of SDXL-Turbo.
The generated images have a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.
The model cannot render legible text.
Faces and people may not be generated properly.
The autoencoding part of the model is lossy.

Recommendations

The model is suitable for both non-commercial and commercial use.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご