開源Flash - SDXL模型：閃電速度4步生成，免費暢享高質量圖像！

首頁

Flash Sdxl

由jasperai開發

閃電擴散是一種擴散蒸餾方法，能夠在4步內生成高質量圖像，是SDXL的1.08億參數LoRA蒸餾版本。

圖像生成 #4步快速生成 #擴散模型蒸餾 #LoRA微調

下載量 84

發布時間 : 6/2/2024

模型概述

閃電擴散是一種擴散蒸餾方法，由Jasper Research團隊提出，能夠在極少的步驟內生成高質量圖像。該模型基於SDXL基礎模型進行蒸餾，旨在實現快速圖像生成。

模型特點

極速生成

能夠在僅4步推理步驟內生成高質量圖像

高效蒸餾

通過擴散蒸餾方法大幅減少生成所需時間

兼容性強

可與現有LoRA和ControlNet模型結合使用

模型能力

文本到圖像生成

快速圖像合成

風格化圖像生成

使用案例

創意設計

概念藝術創作

快速生成創意概念藝術圖像

在4步內生成高質量概念藝術

產品原型設計

快速生成產品設計原型圖像

加速設計迭代過程

教育娛樂

故事插圖生成

根據文本描述快速生成故事插圖

浣熊在森林中讀書的示例圖像

🚀 ⚡ 閃電擴散：FlashSDXL ⚡

閃電擴散（Flash Diffusion）是一種擴散蒸餾方法，由Jasper Research的Clément Chadebec、Onur Tasar、Eyal Benaroche和Benjamin Aubin在論文閃電擴散：加速任意條件擴散模型以實現少步圖像生成中提出。該模型是 SDXL 模型的 1.08億參數LoRA 蒸餾版本，能夠在 4步內 生成圖像。此模型的主要目的是復現論文中的主要結果。

查看我們的即時演示和官方 Github倉庫。

🚀 快速開始

本模型可直接使用 diffusers 庫中的 DiffusionPipeline 進行調用，能夠將所需的採樣步驟減少到 4步。

💻 使用示例

基礎用法

from diffusers import DiffusionPipeline, LCMScheduler

adapter_id = "jasperai/flash-sdxl"

pipe = DiffusionPipeline.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  use_safetensors=True,
)

pipe.scheduler = LCMScheduler.from_pretrained(
  "stabilityai/stable-diffusion-xl-base-1.0",
  subfolder="scheduler",
  timestep_spacing="trailing",
)
pipe.to("cuda")

# Fuse and load LoRA weights
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()

prompt = "A raccoon reading a book in a lush forest."

image = pipe(prompt, num_inference_steps=4, guidance_scale=0).images[0]

在ComfyUI中使用

要在本地使用ComfyUI運行FlashSDXL，你需要：

確保你的ComfyUI安裝是最新版本。
從 Hugging Face 下載檢查點文件。如果你不知道如何操作，可以前往 “文件和版本”，進入 comfy/ 文件夾，點擊 FlashSDXL.safetensors 旁邊的下載按鈕。
將新的檢查點文件移動到你本地的 comfyUI/models/loras/. 文件夾中。
將其作為LoRA應用於 sd_xl_base_1.0_0.9vae.safetensors 之上，本倉庫中提供了一個簡單的ComfyUI workflow.json 文件（位於同一 comfy/ 文件夾中）。

⚠️ 重要提示

該模型是在cfg比例為1和LCM調度器的條件下進行訓練的，但參數可以進行一些調整。

與現有LoRA結合使用 🎨

FlashSDXL還可以與現有的LoRA結合使用，以 無需訓練 的方式實現少步圖像生成。它可以直接集成到Hugging Face的管道中。以下是一個示例：

from diffusers import DiffusionPipeline, LCMScheduler
import torch

user_lora_id = "TheLastBen/Papercut_SDXL"
trigger_word = "papercut"

flash_lora_id = "jasperai/flash-sdxl"

# Load Pipeline
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16"
)

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(
    pipe.scheduler.config
)

# Load LoRAs
pipe.load_lora_weights(flash_lora_id, adapter_name="flash")
pipe.load_lora_weights(user_lora_id, adapter_name="lora")

pipe.set_adapters(["flash", "lora"], adapter_weights=[1.0, 1.0])
pipe.to(device="cuda", dtype=torch.float16)

prompt = f"{trigger_word} a cute corgi"

image = pipe(
    prompt,
    num_inference_steps=4,
    guidance_scale=0
).images[0]

💡 使用建議

你也可以使用提供的Comfy工作流來使用額外的LoRA，並在本地機器上進行測試。

與現有ControlNet結合使用 🎨

FlashSDXL還可以與現有的ControlNet結合使用，以 無需訓練 的方式實現少步圖像生成。它可以直接集成到Hugging Face的管道中。以下是一個示例：

import torch
import cv2
import numpy as np
from PIL import Image

from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid

flash_lora_id = "jasperai/flash-sdxl"

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
).resize((1024, 1024))

image = np.array(image)

image = cv2.Canny(image, 100, 200)
image = image[:, :, None].repeat(3, 2)
canny_image = Image.fromarray(image)

# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16,
    variant="fp16"
)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
    variant="fp16"
).to("cuda")

# Set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# Load LoRA
pipe.load_lora_weights(flash_lora_id)
pipe.fuse_lora()

image = pipe(
    "picture of the mona lisa",
    image=canny_image,
    num_inference_steps=4,
    guidance_scale=0,
    controlnet_conditioning_scale=0.5,
    cross_attention_kwargs={"scale": 1},
).images[0]
make_image_grid([canny_image, image], rows=1, cols=2)

🔧 技術細節

該模型在4塊H100 GPU上進行了20000次迭代訓練（約相當於總共176個GPU小時的訓練時間）。有關更多參數細節，請參考論文。

COCO 2014驗證集上的指標（表3）

FID-10k：21.62（4 NFE）
CLIP得分：0.327（4 NFE）

📚 詳細文檔

引用

如果你發現這項工作有用或在你的研究中使用了它，請考慮引用我們：

@misc{chadebec2024flash,
      title={Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation}, 
      author={Clement Chadebec and Onur Tasar and Eyal Benaroche and Benjamin Aubin},
      year={2024},
      eprint={2406.02347},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}