đ ⥠Flash Diffusion: FlashSDXL âĄ
Flash Diffusion is a diffusion distillation method that accelerates image generation. It can generate images in just 4 steps, significantly reducing the number of required sampling steps. This 108M LoRA distilled version of the SDXL model reproduces the main results of the related paper. Check out our live demo and official Github repo.
đ Quick Start
The model can be used directly with the DiffusionPipeline
from the diffusers
library, reducing the required sampling steps to 4.
đģ Usage Examples
Basic Usage
from diffusers import DiffusionPipeline, LCMScheduler
adapter_id = "jasperai/flash-sdxl"
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
use_safetensors=True,
)
pipe.scheduler = LCMScheduler.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
subfolder="scheduler",
timestep_spacing="trailing",
)
pipe.to("cuda")
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()
prompt = "A raccoon reading a book in a lush forest."
image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]
Advanced Usage
Using in Comfy
To use FlashSDXL locally with Comfyui, follow these steps:
- Ensure your ComfyUI installation is up - to - date.
- Download the checkpoint from huggingface. Navigate to "Files and Version", go to the
comfy/
folder, and click the download button next to FlashSDXL.safetensors
.
- Move the new checkpoint file to your local
comfyUI/models/loras/.
folder.
- Use it as a LoRA on top of
sd_xl_base_1.0_0.9vae.safetensors
. A simple ComfyUI workflow.json
is provided in this repo (available in the same comfy/
folder).
â ī¸ Important Note
The model has been trained to work with a cfg scale of 1 and an LCM scheduler, but parameters can be slightly adjusted.
Combining with Existing LoRAs đ¨
FlashSDXL can be combined with existing LoRAs to enable few - step generation without training. It can be directly integrated into Hugging Face pipelines.
from diffusers import DiffusionPipeline, LCMScheduler
import torch
user_lora_id = "TheLastBen/Papercut_SDXL"
trigger_word = "papercut"
flash_lora_id = "jasperai/flash-sdxl"
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16"
)
pipe.scheduler = LCMScheduler.from_config(
pipe.scheduler.config
)
pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
pipe.load_lora_weights(user_lora_id, adapter_name="lora")
pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
pipe.to(device="cuda", dtype=torch.float16)
prompt = f"{trigger_word} a cute corgi"
image = pipe(
prompt,
num_inference_steps=4,
guidance_scale=0
).images[0]
đĄ Usage Tip
You can also use additional LoRAs with the provided Comfy workflow and test them on your machine.
Combining with Existing ControlNets đ¨
FlashSDXL can be combined with existing ControlNets to enable few - step generation without training. It can be directly integrated into Hugging Face pipelines.
import torch
import cv2
import numpy as np
from PIL import Image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
flash_lora_id = "jasperai/flash-sdxl"
image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((1024, 1024))
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None].repeat(3, 2)
canny_image = Image.fromarray(image)
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=None,
variant="fp16"
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(flash_lora_id)
pipe.fuse_lora()
image = pipe(
"picture of the mona lisa",
image=canny_image,
num_inference_steps=4,
guidance_scale=0,
controlnet_conditioning_scale=0.5,
cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)
đ§ Technical Details
The model was trained for 20k iterations on 4 H100 GPUs, approximately 176 GPU hours in total. For further parameter details, please refer to the paper.
Property |
Details |
Model Type |
108M LoRA distilled version of SDXL |
Training Data |
Not specified, refer to paper |
Metrics on COCO 2014 validation (Table 3)
- FID - 10k: 21.62 (4 NFE)
- CLIP Score: 0.327 (4 NFE)
đ Documentation
Citation
If you find this work useful or use it in your research, please consider citing us
@misc{chadebec2024flash,
title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation},
author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
year={2024},
eprint={2406.02347},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
đ License
This model is released under the Creative Commons BY - NC license.