TDM_CogVideoX-2B_LoRA開源視頻生成模型 - 4步推理加速25倍生成高質量視頻

首頁

TDM CogVideoX 2B LoRA

由Luo-Yihong開發

TDM是一種通過軌跡分佈匹配技術實現高效少步擴散的模型，可在4步推理內生成高質量視頻，相比原始模型實現25倍加速且性能無損。

文本生成視頻開源協議:Apache-2.0 #少步擴散 #文本到視頻 #高效蒸餾

下載量 49

發布時間 : 3/16/2025

模型概述

TDM通過創新的軌跡分佈匹配技術，從教師模型（如CogVideoX-2B）中蒸餾知識，實現極低步數（4步）的高質量文本到視頻生成。

模型特點

極速推理

僅需4步推理即可生成高質量視頻，相比原始模型實現25倍加速

性能無損

在顯著減少推理步數的同時保持生成質量，用戶研究顯示與教師模型結果難以區分

高效訓練

僅需500次訓練迭代和2小時A800訓練時間即可完成蒸餾

廣泛適配

提供SD3、Dreamshaper等多版本LoRA適配器，支持不同基礎模型

模型能力

文本到視頻生成

高質量圖像生成

少步快速推理

模型蒸餾

使用案例

內容創作

短視頻生成

快速生成社交媒體短視頻內容

4步推理即可生成49幀視頻

創意可視化

將文字描述快速轉化為視覺內容

保持藝術風格的同時大幅提升生成速度

教育娛樂

互動故事講述

即時生成故事場景動畫

實現接近即時的交互式體驗

🚀 TDM: 通過軌跡分佈匹配學習少步擴散模型

本項目提出了一種通過軌跡分佈匹配學習少步擴散模型的方法，可實現快速的文本到圖像和文本到視頻生成，在不降低性能的情況下顯著提升生成速度。

🚀 快速開始

這是論文 "Learning Few-Step Diffusion Models by Trajectory Distribution Matching" 的官方代碼庫，作者為 Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang。

Github 倉庫：https://github.com/Luo-Yihong/TDM

✨ 主要特性

用戶研究

user_study 你認為哪個更好呢？一些圖像是由 Pixart-α (50 NFE) 生成的，而另一些圖像是由 TDM (4 NFE) 生成的。TDM 以無數據的方式從 Pixart-α 中蒸餾而來，僅需 500 次訓練迭代和 2 個 A800 GPU 小時。

點擊查看答案

TDM 位置的答案（從左到右）：底部、底部、頂部、底部、頂部。

快速文本到視頻生成

我們提出的 TDM 可以輕鬆擴展到文本到視頻生成任務。

Teacher Student

上方的視頻是由 CogVideoX - 2B (100 NFE) 生成的。在相同的時間內，TDM (4NFE) 可以生成 25 個視頻，如下所示，實現了令人印象深刻的 25 倍速度提升，且性能不下降。（注意：GIF 中的噪點是由於壓縮造成的。）

📦 安裝指南

文檔未提及安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

TDM - SD3 - LoRA

import torch
from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from diffusers.utils import make_image_grid
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA
pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125.
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory.
pipe.vae.config.shift_factor = 0.0
pipe = pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6.
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
generator = torch.manual_seed(8888)
image = pipe(
    prompt="A cute panda holding a sign says TDM SOTA!",
    negative_prompt="",
    num_inference_steps=4,
    height=1024,
    width=1024,
    num_images_per_prompt = 1,
    guidance_scale=1.,
    generator = generator,
).images[0]

pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
pipe.set_adapters(["tdm"], [0.]) # Unload lora
generator = torch.manual_seed(8888)
teacher_image = pipe(
    prompt="A cute panda holding a sign says TDM SOTA!",
    negative_prompt="",
    num_inference_steps=28,
    height=1024,
    width=1024,
    num_images_per_prompt = 1,
    guidance_scale=7.,
    generator = generator,
).images[0]
make_image_grid([image,teacher_image],1,2)

sd3_compare 右側是 SD3 以 56 NFE 生成的樣本，左側是 TDM 以 4NFE 生成的樣本。你覺得哪個更好呢？

TDM - Dreamshaper - v7 - LoRA

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA"
ckpt_name = "tdm_dreamshaper.pt"
pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
generator = torch.manual_seed(317)
image = pipe(
    prompt="A close-up photo of an Asian lady with sunglasses",
    negative_prompt="",
    num_inference_steps=4,
    num_images_per_prompt = 1,
    generator = generator,
    guidance_scale=1.,
).images[0]
image

tdm_dreamshaper

TDM - CogVideoX - 2B - LoRA

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16)
pipe.vae.enable_slicing() # Save memory
pipe.vae.enable_tiling() # Save memory
pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA")
pipe.to("cuda")
prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The "
    "panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance"
)
# We train the generator on timesteps [999, 856, 665, 399].
# The official scheduler of CogVideo-X using uniform spacing, may cause inferior results.
# But TDM-LoRA still works well under 4 NFE.
# We will update the TDM-CogVideoX-LoRA soon for better performance!
generator = torch.manual_seed(8888)
frames = pipe(prompt, guidance_scale=1, 
              num_inference_steps=4, 
              num_frames=49,
              generator = generator).frames[0]
export_to_video(frames, "output-TDM.mp4", fps=8)

高級用法

文檔未提及高級用法相關內容，故跳過此部分。

📚 詳細文檔

預訓練模型

我們發佈了一系列 TDM - LoRA 模型，歡迎使用！

待辦事項

發佈訓練代碼。

聯繫我們

如果您對本工作有任何疑問，請聯繫 Yihong Luo (yluocg@connect.ust.hk)。

引用信息

@misc{luo2025tdm,
      title={Learning Few-Step Diffusion Models by Trajectory Distribution Matching}, 
      author={Yihong Luo and Tianyang Hu and Jiacheng Sun and Yujun Cai and Jing Tang},
      year={2025},
      eprint={2503.06674},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.06674}, 
}