Hyper - SD開源擴散模型加速技術 - 支持多模型免費實現快速推理

首頁

Hyper SD

由ByteDance開發

Hyper-SD是一種先進的擴散模型加速技術，支持多種基礎模型（如FLUX.1-dev、SD3、SDXL和SD1.5）的快速推理。

圖像生成 #超快速圖像生成 #低步數推理 #LoRA微調

下載量 129.08k

發布時間 : 4/20/2024

模型概述

Hyper-SD是最新的擴散模型加速技術之一，通過LoRA和蒸餾技術實現快速高質量的圖像生成。

模型特點

高效推理

支持1步到16步的快速推理，顯著提升生成速度。

多模型兼容

兼容FLUX.1-dev、SD3、SDXL和SD1.5等多種基礎模型。

高質量生成

即使在極少的推理步數下，仍能保持高質量的圖像生成效果。

靈活配置

支持LoRA縮放比例和引導尺度的靈活調整，以適應不同需求。

模型能力

文本生成圖像

快速推理

高質量圖像生成

多模型兼容

使用案例

創意設計

快速概念設計

設計師可以快速生成多種概念草圖，加速創意過程。

在1-8步內生成高質量概念圖

內容創作

社交媒體內容生成

快速生成適合社交媒體的視覺內容。

高效生成多樣化視覺素材

🚀 Hyper-SD

Hyper-SD是最新的擴散模型加速技術之一，本項目發佈了基於多個基礎模型蒸餾得到的模型，可實現高效的文生圖任務。

🚀 快速開始

本項目官方論文為 Hyper-SD ，項目頁面為：https://hyper-sd.github.io/ 。

你可以嘗試我們在Hugging Face上的演示：

✨ 主要特性

多模型支持：發佈了基於 FLUX.1-dev、SD3-Medium、SDXL Base 1.0 和 Stable-Diffusion v1-5 蒸餾得到的模型。
多步長LoRA支持：提供不同步長的LoRA模型，如8步、16步等，且支持不同的引導比例設置。
控制網絡兼容：與不同的基礎模型和控制網絡高度兼容。

📦 安裝指南

使用本項目的模型，你需要安裝相應的依賴庫，如 diffusers、torch 等：

pip install diffusers torch huggingface_hub safetensors

💻 使用示例

文生圖使用示例

FLUX.1-dev相關模型

import torch
from diffusers import FluxPipeline
from huggingface_hub import hf_hub_download
base_model_id = "black-forest-labs/FLUX.1-dev"
repo_name = "ByteDance/Hyper-SD"
# 以8步lora為例
ckpt_name = "Hyper-FLUX.1-dev-8steps-lora.safetensors"
# 加載模型，請填寫你的訪問令牌，因為FLUX.1-dev倉庫是受保護的模型。
pipe = FluxPipeline.from_pretrained(base_model_id, token="xxx")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora(lora_scale=0.125)
pipe.to("cuda", dtype=torch.float16)
image=pipe(prompt="a photo of a cat", num_inference_steps=8, guidance_scale=3.5).images[0]
image.save("output.png")

SD3相關模型

import torch
from diffusers import StableDiffusion3Pipeline
from huggingface_hub import hf_hub_download
base_model_id = "stabilityai/stable-diffusion-3-medium-diffusers"
repo_name = "ByteDance/Hyper-SD"
# 以8步lora為例
ckpt_name = "Hyper-SD3-8steps-CFG-lora.safetensors"
# 加載模型，請填寫你的訪問令牌，因為SD3倉庫是受保護的模型。
pipe = StableDiffusion3Pipeline.from_pretrained(base_model_id, token="xxx")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora(lora_scale=0.125)
pipe.to("cuda", dtype=torch.float16)
image=pipe(prompt="a photo of a cat", num_inference_steps=8, guidance_scale=5.0).images[0]
image.save("output.png")

SDXL相關模型

2步、4步、8步LoRA

以2步LoRA為例，你也可以使用其他LoRA進行相應的推理步長設置。

import torch
from diffusers import DiffusionPipeline, DDIMScheduler
from huggingface_hub import hf_hub_download
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "ByteDance/Hyper-SD"
# 以2步lora為例
ckpt_name = "Hyper-SDXL-2steps-lora.safetensors"
# 加載模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 確保ddim調度器時間步間距設置為trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
# 較低的eta值會產生更多細節
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=2, guidance_scale=0).images[0]

統一LoRA（支持1到8步推理）

你可以靈活調整推理步數和eta值以達到最佳性能。

import torch
from diffusers import DiffusionPipeline, TCDScheduler
from huggingface_hub import hf_hub_download
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "ByteDance/Hyper-SD"
ckpt_name = "Hyper-SDXL-1step-lora.safetensors"
# 加載模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 使用TCD調度器以獲得更好的圖像質量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 對於多步推理，較低的eta值會產生更多細節
eta=1.0
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, eta=eta).images[0]

1步SDXL Unet

僅用於單步推理。

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "ByteDance/Hyper-SD"
ckpt_name = "Hyper-SDXL-1step-Unet.safetensors"
# 加載模型。
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo_name, ckpt_name), device="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
# 使用LCM調度器代替ddim調度器以支持特定時間步輸入
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
# 在單步推理中設置起始時間步為800以獲得更好的結果
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, timesteps=[800]).images[0]

SD1.5相關模型

2步、4步、8步LoRA

以2步LoRA為例，你也可以使用其他LoRA進行相應的推理步長設置。

import torch
from diffusers import DiffusionPipeline, DDIMScheduler
from huggingface_hub import hf_hub_download
base_model_id = "runwayml/stable-diffusion-v1-5"
repo_name = "ByteDance/Hyper-SD"
# 以2步lora為例
ckpt_name = "Hyper-SD15-2steps-lora.safetensors"
# 加載模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 確保ddim調度器時間步間距設置為trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=2, guidance_scale=0).images[0]

統一LoRA（支持1到8步推理）

你可以靈活調整推理步數和eta值以達到最佳性能。

import torch
from diffusers import DiffusionPipeline, TCDScheduler
from huggingface_hub import hf_hub_download
base_model_id = "runwayml/stable-diffusion-v1-5"
repo_name = "ByteDance/Hyper-SD"
ckpt_name = "Hyper-SD15-1step-lora.safetensors"
# 加載模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 使用TCD調度器以獲得更好的圖像質量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 對於多步推理，較低的eta值會產生更多細節
eta=1.0
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, eta=eta).images[0]

控制網絡使用示例

SDXL相關模型

2步、4步、8步LoRA

以Canny控制網絡和2步推理為例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL, DDIMScheduler
from huggingface_hub import hf_hub_download

# 加載原始圖像
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
image = np.array(image)
# 準備Canny控制圖像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")
control_weight = 0.5  # 推薦用於良好的泛化

# 初始化管道
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, torch_dtype=torch.float16).to("cuda")

pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SDXL-2steps-lora.safetensors"))
# 確保ddim調度器時間步間距設置為trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
pipe.fuse_lora()
image = pipe("A chocolate cookie", num_inference_steps=2, image=control_image, guidance_scale=0, controlnet_conditioning_scale=control_weight).images[0]
image.save('image_out.png')

統一LoRA（支持1到8步推理）

以Canny控制網絡為例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL, TCDScheduler
from huggingface_hub import hf_hub_download

# 加載原始圖像
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
image = np.array(image)
# 準備Canny控制圖像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")
control_weight = 0.5  # 推薦用於良好的泛化

# 初始化管道
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet, vae=vae, torch_dtype=torch.float16).to("cuda")

# 加載Hyper-SD15-1step lora
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SDXL-1step-lora.safetensors"))
pipe.fuse_lora()
# 使用TCD調度器以獲得更好的圖像質量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 對於多步推理，較低的eta值會產生更多細節
eta=1.0
image = pipe("A chocolate cookie", num_inference_steps=4, image=control_image, guidance_scale=0, controlnet_conditioning_scale=control_weight, eta=eta).images[0]
image.save('image_out.png')

SD1.5相關模型

2步、4步、8步LoRA

以Canny控制網絡和2步推理為例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline, DDIMScheduler

from huggingface_hub import hf_hub_download

controlnet_checkpoint = "lllyasviel/control_v11p_sd15_canny"

# 加載原始圖像
image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/input.png")
image = np.array(image)
# 準備Canny控制圖像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")

# 初始化管道
controlnet = ControlNetModel.from_pretrained(controlnet_checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SD15-2steps-lora.safetensors"))
pipe.fuse_lora()
# 確保ddim調度器時間步間距設置為trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
image = pipe("a blue paradise bird in the jungle", num_inference_steps=2, image=control_image, guidance_scale=0).images[0]
image.save('image_out.png')

統一LoRA（支持1到8步推理）

以Canny控制網絡為例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline, TCDScheduler
from huggingface_hub import hf_hub_download

controlnet_checkpoint = "lllyasviel/control_v11p_sd15_canny"

# 加載原始圖像
image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/input.png")
image = np.array(image)
# 準備Canny控制圖像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")

# 初始化管道
controlnet = ControlNetModel.from_pretrained(controlnet_checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16).to("cuda")
# 加載Hyper-SD15-1step lora
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SD15-1step-lora.safetensors"))
pipe.fuse_lora()
# 使用TCD調度器以獲得更好的圖像質量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 對於多步推理，較低的eta值會產生更多細節
eta=1.0
image = pipe("a blue paradise bird in the jungle", num_inference_steps=1, image=control_image, guidance_scale=0, eta=eta).images[0]
image.save('image_out.png')

ComfyUI使用示例

Hyper-SDXL-Nsteps-lora.safetensors：文生圖工作流
Hyper-SD15-Nsteps-lora.safetensors：文生圖工作流
Hyper-SDXL-1step-Unet-Comfyui.fp16.safetensors：文生圖工作流
- 1步SDXL UNet要求/安裝：請將我們的調度器文件夾安裝到你的 ComfyUI/custom_nodes 中，以從800時間步而不是999時間步開始採樣。即確保 ComfyUI/custom_nodes/ComfyUI-HyperSDXL1StepUnetScheduler 文件夾存在。更多詳情請參考我們的技術報告。
Hyper-SD15-1step-lora.safetensors：文生圖工作流
Hyper-SDXL-1step-lora.safetensors：文生圖工作流
- 1步統一LoRA要求/安裝：請將 ComfyUI-TCD 安裝到你的 ComfyUI/custom_nodes 中，以使用支持不同推理步數（1 - 8）的TCD調度器。即確保 ComfyUI/custom_nodes/ComfyUI-TCD 文件夾存在。建議你調整TCD調度器中的eta參數以獲得更好的結果。

📚 詳細文檔

模型檢查點

Hyper-FLUX.1-dev-Nsteps-lora.safetensors：LoRA檢查點，用於FLUX.1-dev相關模型。
Hyper-SD3-Nsteps-CFG-lora.safetensors：LoRA檢查點，用於SD3相關模型。
Hyper-SDXL-Nstep-lora.safetensors：LoRA檢查點，用於SDXL相關模型。
Hyper-SD15-Nstep-lora.safetensors：LoRA檢查點，用於SD1.5相關模型。
Hyper-SDXL-1step-unet.safetensors：從SDXL-Base蒸餾得到的Unet檢查點。

新聞動態🔥🔥🔥

2024年8月26日。💥💥💥 我們的8步和16步 FLUX.1-dev相關LoRA 現已可用！我們建議LoRA縮放比例約為0.125，這與訓練相適應，引導比例可保持在3.5。低步數的LoRA即將推出。💥💥💥
2024年8月19日。SD3相關的CFG LoRA現已可用！我們建議在4/8/16步時將引導比例設置為3.0/5.0/7.0。在使用diffusers進行推理之前，不要忘記以相對較小的比例（例如與訓練相適應的0.125）融合LoRA。請注意，8步和16步的LoRA也可以分別在稍小的步數（如6步和12步）上進行推理。希望聽到你的反饋，與FLUX相關的模型將於下週推出。
2024年5月13日。12步CFG保留的 Hyper-SDXL-12steps-CFG-LoRA 和 Hyper-SD15-12steps-CFG-LoRA 現已可用（支持5 - 8的引導比例），這在性能和速度之間有更好的權衡，更具實用性。享受使用！
2024年4月30日。我們的8步CFG保留的 Hyper-SDXL-8steps-CFG-LoRA 和 Hyper-SD15-8steps-CFG-LoRA 現已可用（支持5 - 8的引導比例），我們強烈建議將8步CFGLora作為所有SDXL和SD15模型的標準配置！！！
2024年4月28日。基於1步統一LoRA 🥰 和TCD調度器在不同步數上進行推理的ComfyUI工作流已發佈！記得在你的 ComfyUI/custom_nodes 文件夾中安裝 ⭕️ ComfyUI-TCD！鼓勵你調整eta參數以獲得更好的結果 🌟！
2024年4月26日。感謝 @Pete 為我們的塗鴉演示貢獻了更大的畫布 👏。
2024年4月24日。1步SDXL UNet的ComfyUI 工作流和檢查點 ✨ 也已可用！不要忘記 ⭕️ 在你的 ComfyUI/custom_nodes 文件夾中安裝自定義調度器！！！
2024年4月23日。基於N步LoRA的ComfyUI工作流已發佈！創作者值得一試 💥！
2024年4月23日。我們的技術報告 📚 已上傳到 arXiv！提供了許多實現細節，歡迎更多討論👏。
2024年4月21日。Hyper-SD ⚡️ 與不同的基礎模型和控制網絡高度兼容，並且運行良好。為了明確說明，我們還在此處附加了控制網絡的使用示例。
2024年4月20日。我們的檢查點和兩個演示 🤗（即 SD15-塗鴉和 SDXL-文生圖）已在 HuggingFace倉庫上公開可用。

🔧 技術細節

本項目的技術細節可參考我們上傳到 arXiv 的技術報告，其中提供了許多實現細節。

📄 許可證

如果你在研究中使用了本項目，請使用以下BibTeX引用：

@misc{ren2024hypersd,
      title={Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis}, 
      author={Yuxi Ren and Xin Xia and Yanzuo Lu and Jiacheng Zhang and Jie Wu and Pan Xie and Xing Wang and Xuefeng Xiao},
      year={2024},
      eprint={2404.13686},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}