DMD2開源圖像合成模型 - 用改進方法實現快速高效圖像生成

首頁

DMD2

由tianweiy開發

DMD2是一種基於擴散模型的快速圖像合成技術，通過改進的分佈匹配蒸餾方法實現高效的圖像生成。

圖像生成 #4步快速圖像生成 #1步超快合成 #擴散蒸餾技術

下載量 39.89k

發布時間 : 5/23/2024

模型概述

DMD2是一種基於穩定擴散XL模型的改進版本，專注於通過分佈匹配蒸餾技術實現快速圖像合成。該模型支持4步甚至1步推理生成高質量圖像，顯著提高了生成效率。

模型特點

快速圖像合成

支持4步甚至1步推理即可生成高質量圖像，顯著提高生成效率

多種推理模式

提供UNet直接生成、LoRA生成和T2I適配器等多種使用方式

高質量輸出

即使在大幅減少推理步數的情況下仍能保持較高的圖像質量

兼容性強

可與Stable Diffusion XL基礎模型和各種適配器配合使用

模型能力

文本生成圖像

快速圖像合成

圖像風格轉換

條件圖像生成

使用案例

創意設計

概念藝術創作

快速生成各種風格的概念藝術圖像

4步內生成高質量概念圖

產品設計原型

為產品設計快速生成視覺原型

高效迭代設計概念

內容創作

社交媒體內容生成

快速生成社交媒體所需的視覺內容

高質量圖像快速產出

🚀 DMD2模型卡片

DMD2模型基於改進的分佈匹配蒸餾技術，可實現快速圖像合成，在圖像生成領域具有高效、優質的特點，能為相關研究和應用提供有力支持。

image/jpeg

Improved Distribution Matching Distillation for Fast Image Synthesis
Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman

📞 聯繫方式

如果您對論文有任何疑問，請隨時與我們聯繫！

Tianwei Yin tianweiy@mit.edu

💻 使用示例

基礎用法

我們可以使用標準的擴散器管道：

4步UNet生成

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin"
# Load model.
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt="a photo of a cat"

# LCMScheduler's default timesteps are different from the one we used for training 
image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0]

4步LoRA生成

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_4step_lora_fp16.safetensors"
# Load model.
pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.fuse_lora(lora_scale=1.0)  # we might want to make the scale smaller for community models

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt="a photo of a cat"

# LCMScheduler's default timesteps are different from the one we used for training 
image=pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0, timesteps=[999, 749, 499, 249]).images[0]

1步UNet生成

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_1step_unet_fp16.bin"
# Load model.
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))
pipe = DiffusionPipeline.from_pretrained(base_model_id, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt="a photo of a cat"
image=pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0, timesteps=[399]).images[0]

4步T2I適配器

from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, AutoencoderKL, UNet2DConditionModel, LCMScheduler
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.canny import CannyDetector
from huggingface_hub import hf_hub_download
import torch

# load adapter
adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-canny-sdxl-1.0", torch_dtype=torch.float16, varient="fp16").to("cuda")

vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)

base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo_name = "tianweiy/DMD2"
ckpt_name = "dmd2_sdxl_4step_unet_fp16.bin"
# Load model.
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(torch.load(hf_hub_download(repo_name, ckpt_name), map_location="cuda"))

pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    base_model_id, unet=unet, vae=vae, adapter=adapter, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()

canny_detector = CannyDetector()

url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg"
image = load_image(url)

# Detect the canny map in low resolution to avoid high-frequency details
image = canny_detector(image, detect_resolution=384, image_resolution=1024)#.resize((1024, 1024))

prompt = "Mystical fairy in real, magic, 4k picture, high quality"

gen_images = pipe(
  prompt=prompt,
  image=image,
  num_inference_steps=4,
  guidance_scale=0, 
  adapter_conditioning_scale=0.8, 
  adapter_conditioning_factor=0.5,
  timesteps=[999, 749, 499, 249]
).images[0]
gen_images.save('out_canny.png')

更多信息，請參考代碼倉庫

📄 許可證

改進的分佈匹配蒸餾技術遵循知識共享署名 - 非商業性使用 - 相同方式共享 4.0 國際許可協議。

📚 引用

如果您發現DMD2對您的研究有用或相關，請引用我們的論文：

@article{yin2024improved,
    title={Improved Distribution Matching Distillation for Fast Image Synthesis},
    author={Yin, Tianwei and Gharbi, Micha{\"e}l and Park, Taesung and Zhang, Richard and Shechtman, Eli and Durand, Fredo and Freeman, William T},
    journal={arXiv:2405.14867},
    year={2024}
}

@inproceedings{yin2024onestep,
    title={One-step Diffusion with Distribution Matching Distillation},
    author={Yin, Tianwei and Gharbi, Micha{\"e}l and Zhang, Richard and Shechtman, Eli and Durand, Fr{\'e}do and Freeman, William T and Park, Taesung},
    booktitle={CVPR},
    year={2024}
}

🙏 致謝

這項工作是在Tianwei Yin作為麻省理工學院全日制學生期間完成的。它基於我們對原始DMD論文的重新實現而開發。這項工作得到了美國國家科學基金會合作協議PHY - 2019786（NSF人工智能與基礎相互作用研究所，http://iaifi.org/）、NSF資助2105819、NSF CISE獎1955864以及谷歌、GIST、亞馬遜和廣達電腦的資助。