TDM_CogVideoX-2B_LoRA开源视频生成模型 - 4步推理加速25倍生成高质量视频

首页

TDM CogVideoX 2B LoRA

由 Luo-Yihong 开发

TDM是一种通过轨迹分布匹配技术实现高效少步扩散的模型，可在4步推理内生成高质量视频，相比原始模型实现25倍加速且性能无损。

文本生成视频开源协议:Apache-2.0 #少步扩散 #文本到视频 #高效蒸馏

下载量 49

发布时间 : 3/16/2025

模型简介

TDM通过创新的轨迹分布匹配技术，从教师模型（如CogVideoX-2B）中蒸馏知识，实现极低步数（4步）的高质量文本到视频生成。

模型特点

极速推理

仅需4步推理即可生成高质量视频，相比原始模型实现25倍加速

性能无损

在显著减少推理步数的同时保持生成质量，用户研究显示与教师模型结果难以区分

高效训练

仅需500次训练迭代和2小时A800训练时间即可完成蒸馏

广泛适配

提供SD3、Dreamshaper等多版本LoRA适配器，支持不同基础模型

模型能力

文本到视频生成

高质量图像生成

少步快速推理

模型蒸馏

使用案例

内容创作

短视频生成

快速生成社交媒体短视频内容

4步推理即可生成49帧视频

创意可视化

将文字描述快速转化为视觉内容

保持艺术风格的同时大幅提升生成速度

教育娱乐

互动故事讲述

实时生成故事场景动画

实现接近实时的交互式体验

🚀 TDM: 通过轨迹分布匹配学习少步扩散模型

本项目提出了一种通过轨迹分布匹配学习少步扩散模型的方法，可实现快速的文本到图像和文本到视频生成，在不降低性能的情况下显著提升生成速度。

🚀 快速开始

这是论文 "Learning Few-Step Diffusion Models by Trajectory Distribution Matching" 的官方代码库，作者为 Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang。

Github 仓库：https://github.com/Luo-Yihong/TDM

✨ 主要特性

用户研究

user_study 你认为哪个更好呢？一些图像是由 Pixart-α (50 NFE) 生成的，而另一些图像是由 TDM (4 NFE) 生成的。TDM 以无数据的方式从 Pixart-α 中蒸馏而来，仅需 500 次训练迭代和 2 个 A800 GPU 小时。

点击查看答案

TDM 位置的答案（从左到右）：底部、底部、顶部、底部、顶部。

快速文本到视频生成

我们提出的 TDM 可以轻松扩展到文本到视频生成任务。

Teacher Student

上方的视频是由 CogVideoX - 2B (100 NFE) 生成的。在相同的时间内，TDM (4NFE) 可以生成 25 个视频，如下所示，实现了令人印象深刻的 25 倍速度提升，且性能不下降。（注意：GIF 中的噪点是由于压缩造成的。）

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

TDM - SD3 - LoRA

import torch
from diffusers import StableDiffusion3Pipeline, AutoencoderTiny, DPMSolverMultistepScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from diffusers.utils import make_image_grid
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights('Luo-Yihong/TDM_sd3_lora', adapter_name = 'tdm') # Load TDM-LoRA
pipe.set_adapters(["tdm"], [0.125])# IMPORTANT. Please set LoRA scale to 0.125.
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd3", torch_dtype=torch.float16) # Save GPU memory.
pipe.vae.config.shift_factor = 0.0
pipe = pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
pipe.scheduler.config['flow_shift'] = 6 # the flow_shift can be changed from 1 to 6.
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
generator = torch.manual_seed(8888)
image = pipe(
    prompt="A cute panda holding a sign says TDM SOTA!",
    negative_prompt="",
    num_inference_steps=4,
    height=1024,
    width=1024,
    num_images_per_prompt = 1,
    guidance_scale=1.,
    generator = generator,
).images[0]

pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="scheduler")
pipe.set_adapters(["tdm"], [0.]) # Unload lora
generator = torch.manual_seed(8888)
teacher_image = pipe(
    prompt="A cute panda holding a sign says TDM SOTA!",
    negative_prompt="",
    num_inference_steps=28,
    height=1024,
    width=1024,
    num_images_per_prompt = 1,
    guidance_scale=7.,
    generator = generator,
).images[0]
make_image_grid([image,teacher_image],1,2)

sd3_compare 右侧是 SD3 以 56 NFE 生成的样本，左侧是 TDM 以 4NFE 生成的样本。你觉得哪个更好呢？

TDM - Dreamshaper - v7 - LoRA

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, DPMSolverMultistepScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
repo_name = "Luo-Yihong/TDM_dreamshaper_LoRA"
ckpt_name = "tdm_dreamshaper.pt"
pipe = DiffusionPipeline.from_pretrained('lykon/dreamshaper-7', torch_dtype=torch.float16).to("cuda")
pipe.load_lora_weights(hf_hub_download(repo_name, ckpt_name))
pipe.scheduler = DPMSolverMultistepScheduler.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
generator = torch.manual_seed(317)
image = pipe(
    prompt="A close-up photo of an Asian lady with sunglasses",
    negative_prompt="",
    num_inference_steps=4,
    num_images_per_prompt = 1,
    generator = generator,
    guidance_scale=1.,
).images[0]
image

tdm_dreamshaper

TDM - CogVideoX - 2B - LoRA

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-2b", torch_dtype=torch.float16)
pipe.vae.enable_slicing() # Save memory
pipe.vae.enable_tiling() # Save memory
pipe.load_lora_weights("Luo-Yihong/TDM_CogVideoX-2B_LoRA")
pipe.to("cuda")
prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The "
    "panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance"
)
# We train the generator on timesteps [999, 856, 665, 399].
# The official scheduler of CogVideo-X using uniform spacing, may cause inferior results.
# But TDM-LoRA still works well under 4 NFE.
# We will update the TDM-CogVideoX-LoRA soon for better performance!
generator = torch.manual_seed(8888)
frames = pipe(prompt, guidance_scale=1, 
              num_inference_steps=4, 
              num_frames=49,
              generator = generator).frames[0]
export_to_video(frames, "output-TDM.mp4", fps=8)

高级用法

文档未提及高级用法相关内容，故跳过此部分。

📚 详细文档

预训练模型

我们发布了一系列 TDM - LoRA 模型，欢迎使用！

待办事项

发布训练代码。

联系我们

如果您对本工作有任何疑问，请联系 Yihong Luo (yluocg@connect.ust.hk)。

引用信息

@misc{luo2025tdm,
      title={Learning Few-Step Diffusion Models by Trajectory Distribution Matching}, 
      author={Yihong Luo and Tianyang Hu and Jiacheng Sun and Yujun Cai and Jing Tang},
      year={2025},
      eprint={2503.06674},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.06674}, 
}