开源Flash - SDXL模型：闪电速度4步生成，免费畅享高质量图像！

首页

Flash Sdxl

由 jasperai 开发

闪电扩散是一种扩散蒸馏方法，能够在4步内生成高质量图像，是SDXL的1.08亿参数LoRA蒸馏版本。

图像生成 #4步快速生成 #扩散模型蒸馏 #LoRA微调

下载量 84

发布时间 : 6/2/2024

模型简介

闪电扩散是一种扩散蒸馏方法，由Jasper Research团队提出，能够在极少的步骤内生成高质量图像。该模型基于SDXL基础模型进行蒸馏，旨在实现快速图像生成。

模型特点

极速生成

能够在仅4步推理步骤内生成高质量图像

高效蒸馏

通过扩散蒸馏方法大幅减少生成所需时间

兼容性强

可与现有LoRA和ControlNet模型结合使用

模型能力

文本到图像生成

快速图像合成

风格化图像生成

使用案例

创意设计

概念艺术创作

快速生成创意概念艺术图像

在4步内生成高质量概念艺术

产品原型设计

快速生成产品设计原型图像

加速设计迭代过程

教育娱乐

故事插图生成

根据文本描述快速生成故事插图

浣熊在森林中读书的示例图像

🚀 ⚡ 闪电扩散：FlashSDXL ⚡

闪电扩散（Flash Diffusion）是一种扩散蒸馏方法，由Jasper Research的Clément Chadebec、Onur Tasar、Eyal Benaroche和Benjamin Aubin在论文闪电扩散：加速任意条件扩散模型以实现少步图像生成中提出。该模型是 SDXL 模型的 1.08亿参数LoRA 蒸馏版本，能够在 4步内 生成图像。此模型的主要目的是复现论文中的主要结果。

查看我们的实时演示和官方 Github仓库。

🚀 快速开始

本模型可直接使用 diffusers 库中的 DiffusionPipeline 进行调用，能够将所需的采样步骤减少到 4步。

💻 使用示例

基础用法

from diffusers import DiffusionPipeline, LCMScheduler

adapter_id = "jasperai/flash-sdxl"

pipe = DiffusionPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  use_safetensors=True,
)

pipe.scheduler = LCMScheduler.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  subfolder="scheduler",
  timestep_spacing="trailing",
)
pipe.to("cuda")

# Fuse and load LoRA weights
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()

prompt = "A raccoon reading a book in a lush forest."

image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]

在ComfyUI中使用

要在本地使用ComfyUI运行FlashSDXL，你需要：

确保你的ComfyUI安装是最新版本。
从 Hugging Face 下载检查点文件。如果你不知道如何操作，可以前往 “文件和版本”，进入 comfy/ 文件夹，点击 FlashSDXL.safetensors 旁边的下载按钮。
将新的检查点文件移动到你本地的 comfyUI/models/loras/. 文件夹中。
将其作为LoRA应用于 sd_xl_base_1.0_0.9vae.safetensors 之上，本仓库中提供了一个简单的ComfyUI workflow.json 文件（位于同一 comfy/ 文件夹中）。

⚠️ 重要提示

该模型是在cfg比例为1和LCM调度器的条件下进行训练的，但参数可以进行一些调整。

与现有LoRA结合使用 🎨

FlashSDXL还可以与现有的LoRA结合使用，以 无需训练 的方式实现少步图像生成。它可以直接集成到Hugging Face的管道中。以下是一个示例：

from diffusers import DiffusionPipeline, LCMScheduler
import torch

user_lora_id = "TheLastBen/Papercut_SDXL"
trigger_word = "papercut"

flash_lora_id = "jasperai/flash-sdxl"

# Load Pipeline
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16"
)

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(
    pipe.scheduler.config
)

# Load LoRAs
pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
pipe.load_lora_weights(user_lora_id, adapter_name="lora")

pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
pipe.to(device="cuda", dtype=torch.float16)

prompt = f"{trigger_word} a cute corgi"

image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0
).images[0]

💡 使用建议

你也可以使用提供的Comfy工作流来使用额外的LoRA，并在本地机器上进行测试。

与现有ControlNet结合使用 🎨

FlashSDXL还可以与现有的ControlNet结合使用，以 无需训练 的方式实现少步图像生成。它可以直接集成到Hugging Face的管道中。以下是一个示例：

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid

flash_lora_id = "jasperai/flash-sdxl"

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((1024, 1024))

image = np.array(image)

image = cv2.Canny(image, 100, 200)
image = image[:, :, None].repeat(3, 2)
canny_image = Image.fromarray(image)

# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
    variant="fp16"
).to("cuda")

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Load LoRA
pipe.load_lora_weights(flash_lora_id)
pipe.fuse_lora()

image = pipe(
    "picture of the mona lisa",
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=0,
    controlnet_conditioning_scale=0.5,
    cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)

🔧 技术细节

该模型在4块H100 GPU上进行了20000次迭代训练（约相当于总共176个GPU小时的训练时间）。有关更多参数细节，请参考论文。

COCO 2014验证集上的指标（表3）

FID-10k：21.62（4 NFE）
CLIP得分：0.327（4 NFE）

📚 详细文档

引用

如果你发现这项工作有用或在你的研究中使用了它，请考虑引用我们：

@misc{chadebec2024flash,
      title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation}, 
      author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
      year={2024},
      eprint={2406.02347},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}