Hyper - SD开源扩散模型加速技术 - 支持多模型免费实现快速推理

首页

Hyper SD

由 ByteDance 开发

Hyper-SD是一种先进的扩散模型加速技术，支持多种基础模型（如FLUX.1-dev、SD3、SDXL和SD1.5）的快速推理。

图像生成 #超快速图像生成 #低步数推理 #LoRA微调

下载量 129.08k

发布时间 : 4/20/2024

模型简介

Hyper-SD是最新的扩散模型加速技术之一，通过LoRA和蒸馏技术实现快速高质量的图像生成。

模型特点

高效推理

支持1步到16步的快速推理，显著提升生成速度。

多模型兼容

兼容FLUX.1-dev、SD3、SDXL和SD1.5等多种基础模型。

高质量生成

即使在极少的推理步数下，仍能保持高质量的图像生成效果。

灵活配置

支持LoRA缩放比例和引导尺度的灵活调整，以适应不同需求。

模型能力

文本生成图像

快速推理

高质量图像生成

多模型兼容

使用案例

创意设计

快速概念设计

设计师可以快速生成多种概念草图，加速创意过程。

在1-8步内生成高质量概念图

内容创作

社交媒体内容生成

快速生成适合社交媒体的视觉内容。

高效生成多样化视觉素材

🚀 Hyper-SD

Hyper-SD是最新的扩散模型加速技术之一，本项目发布了基于多个基础模型蒸馏得到的模型，可实现高效的文生图任务。

🚀 快速开始

本项目官方论文为 Hyper-SD ，项目页面为：https://hyper-sd.github.io/ 。

你可以尝试我们在Hugging Face上的演示：

✨ 主要特性

多模型支持：发布了基于 FLUX.1-dev、SD3-Medium、SDXL Base 1.0 和 Stable-Diffusion v1-5 蒸馏得到的模型。
多步长LoRA支持：提供不同步长的LoRA模型，如8步、16步等，且支持不同的引导比例设置。
控制网络兼容：与不同的基础模型和控制网络高度兼容。

📦 安装指南

使用本项目的模型，你需要安装相应的依赖库，如 diffusers、torch 等：

pip install diffusers torch huggingface_hub safetensors

💻 使用示例

文生图使用示例

FLUX.1-dev相关模型

import torch
from diffusers import FluxPipeline
from huggingface_hub import hf_hub_download
base_model_id = "black-forest-labs/FLUX.1-dev"
repo_name = "ByteDance/Hyper-SD"
# 以8步lora为例
ckpt_name = "Hyper-FLUX.1-dev-8steps-lora.safetensors"
# 加载模型，请填写你的访问令牌，因为FLUX.1-dev仓库是受保护的模型。
pipe = FluxPipeline.from_pretrained(base_model_id, token="xxx")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora(lora_scale=0.125)
pipe.to("cuda", dtype=torch.float16)
image=pipe(prompt="a photo of a cat", num_inference_steps=8, guidance_scale=3.5).images[0]
image.save("output.png")

SD3相关模型

import torch
from diffusers import StableDiffusion3Pipeline
from huggingface_hub import hf_hub_download
base_model_id = "stabilityai/stable-diffusion-3-medium-diffusers"
repo_name = "ByteDance/Hyper-SD"
# 以8步lora为例
ckpt_name = "Hyper-SD3-8steps-CFG-lora.safetensors"
# 加载模型，请填写你的访问令牌，因为SD3仓库是受保护的模型。
pipe = StableDiffusion3Pipeline.from_pretrained(base_model_id, token="xxx")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora(lora_scale=0.125)
pipe.to("cuda", dtype=torch.float16)
image=pipe(prompt="a photo of a cat", num_inference_steps=8, guidance_scale=5.0).images[0]
image.save("output.png")

SDXL相关模型

2步、4步、8步LoRA

以2步LoRA为例，你也可以使用其他LoRA进行相应的推理步长设置。

import torch
from diffusers import DiffusionPipeline, DDIMScheduler
from huggingface_hub import hf_hub_download
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "ByteDance/Hyper-SD"
# 以2步lora为例
ckpt_name = "Hyper-SDXL-2steps-lora.safetensors"
# 加载模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 确保ddim调度器时间步间距设置为trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
# 较低的eta值会产生更多细节
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=2, guidance_scale=0).images[0]

统一LoRA（支持1到8步推理）

你可以灵活调整推理步数和eta值以达到最佳性能。

import torch
from diffusers import DiffusionPipeline, TCDScheduler
from huggingface_hub import hf_hub_download
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "ByteDance/Hyper-SD"
ckpt_name = "Hyper-SDXL-1step-lora.safetensors"
# 加载模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 使用TCD调度器以获得更好的图像质量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 对于多步推理，较低的eta值会产生更多细节
eta=1.0
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, eta=eta).images[0]

1步SDXL Unet

仅用于单步推理。

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "ByteDance/Hyper-SD"
ckpt_name = "Hyper-SDXL-1step-Unet.safetensors"
# 加载模型。
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo_name, ckpt_name), device="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
# 使用LCM调度器代替ddim调度器以支持特定时间步输入
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
# 在单步推理中设置起始时间步为800以获得更好的结果
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, timesteps=[800]).images[0]

SD1.5相关模型

2步、4步、8步LoRA

以2步LoRA为例，你也可以使用其他LoRA进行相应的推理步长设置。

import torch
from diffusers import DiffusionPipeline, DDIMScheduler
from huggingface_hub import hf_hub_download
base_model_id = "runwayml/stable-diffusion-v1-5"
repo_name = "ByteDance/Hyper-SD"
# 以2步lora为例
ckpt_name = "Hyper-SD15-2steps-lora.safetensors"
# 加载模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 确保ddim调度器时间步间距设置为trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=2, guidance_scale=0).images[0]

统一LoRA（支持1到8步推理）

你可以灵活调整推理步数和eta值以达到最佳性能。

import torch
from diffusers import DiffusionPipeline, TCDScheduler
from huggingface_hub import hf_hub_download
base_model_id = "runwayml/stable-diffusion-v1-5"
repo_name = "ByteDance/Hyper-SD"
ckpt_name = "Hyper-SD15-1step-lora.safetensors"
# 加载模型。
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora()
# 使用TCD调度器以获得更好的图像质量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 对于多步推理，较低的eta值会产生更多细节
eta=1.0
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, eta=eta).images[0]

控制网络使用示例

SDXL相关模型

2步、4步、8步LoRA

以Canny控制网络和2步推理为例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL, DDIMScheduler
from huggingface_hub import hf_hub_download

# 加载原始图像
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
image = np.array(image)
# 准备Canny控制图像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")
control_weight = 0.5  # 推荐用于良好的泛化

# 初始化管道
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, torch_dtype=torch.float16).to("cuda")

pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SDXL-2steps-lora.safetensors"))
# 确保ddim调度器时间步间距设置为trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
pipe.fuse_lora()
image = pipe("A chocolate cookie", num_inference_steps=2, image=control_image, guidance_scale=0, controlnet_conditioning_scale=control_weight).images[0]
image.save('image_out.png')

统一LoRA（支持1到8步推理）

以Canny控制网络为例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL, TCDScheduler
from huggingface_hub import hf_hub_download

# 加载原始图像
image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png")
image = np.array(image)
# 准备Canny控制图像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")
control_weight = 0.5  # 推荐用于良好的泛化

# 初始化管道
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet, vae=vae, torch_dtype=torch.float16).to("cuda")

# 加载Hyper-SD15-1step lora
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SDXL-1step-lora.safetensors"))
pipe.fuse_lora()
# 使用TCD调度器以获得更好的图像质量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 对于多步推理，较低的eta值会产生更多细节
eta=1.0
image = pipe("A chocolate cookie", num_inference_steps=4, image=control_image, guidance_scale=0, controlnet_conditioning_scale=control_weight, eta=eta).images[0]
image.save('image_out.png')

SD1.5相关模型

2步、4步、8步LoRA

以Canny控制网络和2步推理为例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline, DDIMScheduler

from huggingface_hub import hf_hub_download

controlnet_checkpoint = "lllyasviel/control_v11p_sd15_canny"

# 加载原始图像
image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/input.png")
image = np.array(image)
# 准备Canny控制图像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")

# 初始化管道
controlnet = ControlNetModel.from_pretrained(controlnet_checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SD15-2steps-lora.safetensors"))
pipe.fuse_lora()
# 确保ddim调度器时间步间距设置为trailing !!!
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
image = pipe("a blue paradise bird in the jungle", num_inference_steps=2, image=control_image, guidance_scale=0).images[0]
image.save('image_out.png')

统一LoRA（支持1到8步推理）

以Canny控制网络为例：

import torch
from diffusers.utils import load_image
import numpy as np
import cv2
from PIL import Image
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline, TCDScheduler
from huggingface_hub import hf_hub_download

controlnet_checkpoint = "lllyasviel/control_v11p_sd15_canny"

# 加载原始图像
image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_canny/resolve/main/images/input.png")
image = np.array(image)
# 准备Canny控制图像
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
control_image = Image.fromarray(image)
control_image.save("control.png")

# 初始化管道
controlnet = ControlNetModel.from_pretrained(controlnet_checkpoint, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16).to("cuda")
# 加载Hyper-SD15-1step lora
pipe.load_lora_weights(hf_hub_download("ByteDance/Hyper-SD", "Hyper-SD15-1step-lora.safetensors"))
pipe.fuse_lora()
# 使用TCD调度器以获得更好的图像质量
pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)
# 对于多步推理，较低的eta值会产生更多细节
eta=1.0
image = pipe("a blue paradise bird in the jungle", num_inference_steps=1, image=control_image, guidance_scale=0, eta=eta).images[0]
image.save('image_out.png')

ComfyUI使用示例

Hyper-SDXL-Nsteps-lora.safetensors：文生图工作流
Hyper-SD15-Nsteps-lora.safetensors：文生图工作流
Hyper-SDXL-1step-Unet-Comfyui.fp16.safetensors：文生图工作流
- 1步SDXL UNet要求/安装：请将我们的调度器文件夹安装到你的 ComfyUI/custom_nodes 中，以从800时间步而不是999时间步开始采样。即确保 ComfyUI/custom_nodes/ComfyUI-HyperSDXL1StepUnetScheduler 文件夹存在。更多详情请参考我们的技术报告。
Hyper-SD15-1step-lora.safetensors：文生图工作流
Hyper-SDXL-1step-lora.safetensors：文生图工作流
- 1步统一LoRA要求/安装：请将 ComfyUI-TCD 安装到你的 ComfyUI/custom_nodes 中，以使用支持不同推理步数（1 - 8）的TCD调度器。即确保 ComfyUI/custom_nodes/ComfyUI-TCD 文件夹存在。建议你调整TCD调度器中的eta参数以获得更好的结果。

📚 详细文档

模型检查点

Hyper-FLUX.1-dev-Nsteps-lora.safetensors：LoRA检查点，用于FLUX.1-dev相关模型。
Hyper-SD3-Nsteps-CFG-lora.safetensors：LoRA检查点，用于SD3相关模型。
Hyper-SDXL-Nstep-lora.safetensors：LoRA检查点，用于SDXL相关模型。
Hyper-SD15-Nstep-lora.safetensors：LoRA检查点，用于SD1.5相关模型。
Hyper-SDXL-1step-unet.safetensors：从SDXL-Base蒸馏得到的Unet检查点。

新闻动态🔥🔥🔥

2024年8月26日。💥💥💥 我们的8步和16步 FLUX.1-dev相关LoRA 现已可用！我们建议LoRA缩放比例约为0.125，这与训练相适应，引导比例可保持在3.5。低步数的LoRA即将推出。💥💥💥
2024年8月19日。SD3相关的CFG LoRA现已可用！我们建议在4/8/16步时将引导比例设置为3.0/5.0/7.0。在使用diffusers进行推理之前，不要忘记以相对较小的比例（例如与训练相适应的0.125）融合LoRA。请注意，8步和16步的LoRA也可以分别在稍小的步数（如6步和12步）上进行推理。希望听到你的反馈，与FLUX相关的模型将于下周推出。
2024年5月13日。12步CFG保留的 Hyper-SDXL-12steps-CFG-LoRA 和 Hyper-SD15-12steps-CFG-LoRA 现已可用（支持5 - 8的引导比例），这在性能和速度之间有更好的权衡，更具实用性。享受使用！
2024年4月30日。我们的8步CFG保留的 Hyper-SDXL-8steps-CFG-LoRA 和 Hyper-SD15-8steps-CFG-LoRA 现已可用（支持5 - 8的引导比例），我们强烈建议将8步CFGLora作为所有SDXL和SD15模型的标准配置！！！
2024年4月28日。基于1步统一LoRA 🥰 和TCD调度器在不同步数上进行推理的ComfyUI工作流已发布！记得在你的 ComfyUI/custom_nodes 文件夹中安装 ⭕️ ComfyUI-TCD！鼓励你调整eta参数以获得更好的结果 🌟！
2024年4月26日。感谢 @Pete 为我们的涂鸦演示贡献了更大的画布 👏。
2024年4月24日。1步SDXL UNet的ComfyUI 工作流和检查点 ✨ 也已可用！不要忘记 ⭕️ 在你的 ComfyUI/custom_nodes 文件夹中安装自定义调度器！！！
2024年4月23日。基于N步LoRA的ComfyUI工作流已发布！创作者值得一试 💥！
2024年4月23日。我们的技术报告 📚 已上传到 arXiv！提供了许多实现细节，欢迎更多讨论👏。
2024年4月21日。Hyper-SD ⚡️ 与不同的基础模型和控制网络高度兼容，并且运行良好。为了明确说明，我们还在此处附加了控制网络的使用示例。
2024年4月20日。我们的检查点和两个演示 🤗（即 SD15-涂鸦和 SDXL-文生图）已在 HuggingFace仓库上公开可用。

🔧 技术细节

本项目的技术细节可参考我们上传到 arXiv 的技术报告，其中提供了许多实现细节。

📄 许可证

如果你在研究中使用了本项目，请使用以下BibTeX引用：

@misc{ren2024hypersd,
      title={Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis}, 
      author={Yuxi Ren and Xin Xia and Yanzuo Lu and Jiacheng Zhang and Jie Wu and Pan Xie and Xing Wang and Xuefeng Xiao},
      year={2024},
      eprint={2404.13686},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}