🚀 基于THUDM/CogVideoX - 5b的微调模型项目
本项目是基于 [THUDM/CogVideoX - 5b](https://huggingface.co/THUDM/CogVideoX - 5b) 模型在 [finetrainers/crush - smol](https://huggingface.co/datasets/finetrainers/crush - smol) 数据集上进行微调的成果。同时,我们还提供了参数的LoRA变体。
项目信息
属性 |
详情 |
基础模型 |
THUDM/CogVideoX - 5b |
训练数据集 |
finetrainers/crush - smol |
库名称 |
diffusers |
许可证 |
其他(查看 [许可证链接](https://huggingface.co/THUDM/CogVideoX - 5b/blob/main/LICENSE)) |
示例提示词 |
DIFF_crush 一支红色蜡烛放在金属平台上,一个大金属圆柱体从上方降下,像在液压机下一样压扁蜡烛。蜡烛被压成扁平的圆形,周围留下一堆碎片。 |
示例展示
以下是一些示例提示词及其对应的输出视频:
- **提示词**:DIFF_crush 一支红色蜡烛放在金属平台上,一个大金属圆柱体从上方降下,像在液压机下一样压扁蜡烛。蜡烛被压成扁平的圆形,周围留下一堆碎片。
**输出视频**:[点击查看](./assets/output_0.mp4)
- **提示词**:DIFF_crush 一个灯泡放在木制平台上,一个大金属圆柱体从上方降下,像在液压机下一样压碎灯泡。灯泡被压成扁平的圆形,周围留下一堆碎片。
**输出视频**:[点击查看](./assets/output_1.mp4)
- **提示词**:DIFF_crush 一个厚汉堡放在餐桌上,一个大金属圆柱体从上方降下,像在液压机下一样压碎汉堡。灯泡被压碎,周围留下一堆碎片。
**输出视频**:[点击查看](./assets/output_2.mp4)
标签
- 文本到视频
- diffusers训练
- diffusers
- cogvideox
- cogvideox - diffusers
- 模板:sd - lora
项目代码
项目代码可在 [GitHub](https://github.com/a - r - r - o - w/finetrainers) 上查看。
⚠️ 重要提示
这是一个实验性的检查点,其泛化能力较差是已知的情况。
🚀 快速开始
推理代码
以下是使用微调模型进行推理的代码示例:
from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline
from diffusers.utils import export_to_video
import torch
transformer = CogVideoXTransformer3DModel.from_pretrained(
"finetrainers/crush-smol-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
"THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
prompt = """
DIFF_crush A thick burger is placed on a dining table, and a large metal cylinder descends from above, crushing the burger as if it were under a hydraulic press. The bulb is crushed, leaving a pile of debris around it.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)
训练日志
训练日志可在WandB上查看:[点击查看](https://wandb.ai/sayakpaul/finetrainers - cogvideox/runs/ngcsyhom)
LoRA
我们从微调后的检查点中提取了一个秩为64的LoRA(提取脚本可查看 这里)。这个LoRA 可用于模拟相同的效果:
代码
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
import torch
pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("finetrainers/cakeify-v0", weight_name="extracted_crush_smol_lora_64.safetensors")
prompt = """
DIFF_crush A thick burger is placed on a dining table, and a large metal cylinder descends from above, crushing the burger as if it were under a hydraulic press. The bulb is crushed, leaving a pile of debris around it.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output_lora.mp4", fps=25)