Open-source Flash - SDXL Model: Generate High-quality Images in Just 4 Steps at Lightning Speed and Enjoy Them for Free!

Flash Sdxl

Developed by jasperai

Flash Diffusion is a diffusion distillation method capable of generating high-quality images in just 4 steps. It is a 108-million parameter LoRA distilled version of SDXL.

Image Generation #4-step rapid generation #Diffusion model distillation #LoRA fine-tuning

Downloads 84

Release Time : 6/2/2024

Model Overview

Flash Diffusion is a diffusion distillation method proposed by Jasper Research team, capable of generating high-quality images with extremely few steps. This model is distilled from the SDXL base model, aiming to achieve rapid image generation.

Model Features

Ultra-fast generation

Capable of generating high-quality images with only 4 inference steps

Efficient distillation

Significantly reduces generation time through diffusion distillation method

Strong compatibility

Can be used in combination with existing LoRA and ControlNet models

Model Capabilities

Text-to-image generation

Rapid image synthesis

Stylized image generation

Use Cases

Creative design

Concept art creation

Quickly generate creative concept art images

Generate high-quality concept art in 4 steps

Product prototype design

Quickly generate product design prototype images

Accelerate design iteration process

Education & entertainment

Story illustration generation

Generate story illustrations quickly based on text descriptions

Example image of a raccoon reading in the forest

🚀 ⚡ Flash Diffusion: FlashSDXL ⚡

Flash Diffusion is a diffusion distillation method that accelerates image generation. It can generate images in just 4 steps, significantly reducing the number of required sampling steps. This 108M LoRA distilled version of the SDXL model reproduces the main results of the related paper. Check out our live demo and official Github repo.

🚀 Quick Start

The model can be used directly with the DiffusionPipeline from the diffusers library, reducing the required sampling steps to 4.

💻 Usage Examples

Basic Usage

from diffusers import DiffusionPipeline, LCMScheduler

adapter_id = "jasperai/flash-sdxl"

pipe = DiffusionPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  use_safetensors=True,
)

pipe.scheduler = LCMScheduler.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  subfolder="scheduler",
  timestep_spacing="trailing",
)
pipe.to("cuda")

# Fuse and load LoRA weights
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()

prompt = "A raccoon reading a book in a lush forest."

image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]

Advanced Usage

Using in Comfy

To use FlashSDXL locally with Comfyui, follow these steps:

Ensure your ComfyUI installation is up - to - date.
Download the checkpoint from huggingface. Navigate to "Files and Version", go to the comfy/ folder, and click the download button next to FlashSDXL.safetensors.
Move the new checkpoint file to your local comfyUI/models/loras/. folder.
Use it as a LoRA on top of sd_xl_base_1.0_0.9vae.safetensors. A simple ComfyUI workflow.json is provided in this repo (available in the same comfy/ folder).

⚠️ Important Note

The model has been trained to work with a cfg scale of 1 and an LCM scheduler, but parameters can be slightly adjusted.

Combining with Existing LoRAs 🎨

FlashSDXL can be combined with existing LoRAs to enable few - step generation without training. It can be directly integrated into Hugging Face pipelines.

from diffusers import DiffusionPipeline, LCMScheduler
import torch

user_lora_id = "TheLastBen/Papercut_SDXL"
trigger_word = "papercut"

flash_lora_id = "jasperai/flash-sdxl"

# Load Pipeline
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16"
)

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(
    pipe.scheduler.config
)

# Load LoRAs
pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
pipe.load_lora_weights(user_lora_id, adapter_name="lora")

pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
pipe.to(device="cuda", dtype=torch.float16)

prompt = f"{trigger_word} a cute corgi"

image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0
).images[0]

💡 Usage Tip

You can also use additional LoRAs with the provided Comfy workflow and test them on your machine.

Combining with Existing ControlNets 🎨

FlashSDXL can be combined with existing ControlNets to enable few - step generation without training. It can be directly integrated into Hugging Face pipelines.

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid

flash_lora_id = "jasperai/flash-sdxl"

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((1024, 1024))

image = np.array(image)

image = cv2.Canny(image, 100, 200)
image = image[:, :, None].repeat(3, 2)
canny_image = Image.fromarray(image)

# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
    variant="fp16"
).to("cuda")

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Load LoRA
pipe.load_lora_weights(flash_lora_id)
pipe.fuse_lora()

image = pipe(
    "picture of the mona lisa",
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=0,
    controlnet_conditioning_scale=0.5,
    cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)

🔧 Technical Details

The model was trained for 20k iterations on 4 H100 GPUs, approximately 176 GPU hours in total. For further parameter details, please refer to the paper.

Property	Details
Model Type	108M LoRA distilled version of SDXL
Training Data	Not specified, refer to paper

Metrics on COCO 2014 validation (Table 3)

FID - 10k: 21.62 (4 NFE)
CLIP Score: 0.327 (4 NFE)

📚 Documentation

Citation

If you find this work useful or use it in your research, please consider citing us

@misc{chadebec2024flash,
      title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation}, 
      author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
      year={2024},
      eprint={2406.02347},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📄 License

This model is released under the Creative Commons BY - NC license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご