🚀 基於THUDM/CogVideoX - 5b的微調模型項目
本項目是基於 [THUDM/CogVideoX - 5b](https://huggingface.co/THUDM/CogVideoX - 5b) 模型在 [finetrainers/crush - smol](https://huggingface.co/datasets/finetrainers/crush - smol) 數據集上進行微調的成果。同時,我們還提供了參數的LoRA變體。
項目信息
屬性 |
詳情 |
基礎模型 |
THUDM/CogVideoX - 5b |
訓練數據集 |
finetrainers/crush - smol |
庫名稱 |
diffusers |
許可證 |
其他(查看 [許可證鏈接](https://huggingface.co/THUDM/CogVideoX - 5b/blob/main/LICENSE)) |
示例提示詞 |
DIFF_crush 一支紅色蠟燭放在金屬平臺上,一個大金屬圓柱體從上方降下,像在液壓機下一樣壓扁蠟燭。蠟燭被壓成扁平的圓形,周圍留下一堆碎片。 |
示例展示
以下是一些示例提示詞及其對應的輸出視頻:
- **提示詞**:DIFF_crush 一支紅色蠟燭放在金屬平臺上,一個大金屬圓柱體從上方降下,像在液壓機下一樣壓扁蠟燭。蠟燭被壓成扁平的圓形,周圍留下一堆碎片。
**輸出視頻**:[點擊查看](./assets/output_0.mp4)
- **提示詞**:DIFF_crush 一個燈泡放在木製平臺上,一個大金屬圓柱體從上方降下,像在液壓機下一樣壓碎燈泡。燈泡被壓成扁平的圓形,周圍留下一堆碎片。
**輸出視頻**:[點擊查看](./assets/output_1.mp4)
- **提示詞**:DIFF_crush 一個厚漢堡放在餐桌上,一個大金屬圓柱體從上方降下,像在液壓機下一樣壓碎漢堡。燈泡被壓碎,周圍留下一堆碎片。
**輸出視頻**:[點擊查看](./assets/output_2.mp4)
標籤
- 文本到視頻
- diffusers訓練
- diffusers
- cogvideox
- cogvideox - diffusers
- 模板:sd - lora
項目代碼
項目代碼可在 [GitHub](https://github.com/a - r - r - o - w/finetrainers) 上查看。
⚠️ 重要提示
這是一個實驗性的檢查點,其泛化能力較差是已知的情況。
🚀 快速開始
推理代碼
以下是使用微調模型進行推理的代碼示例:
from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline
from diffusers.utils import export_to_video
import torch
transformer = CogVideoXTransformer3DModel.from_pretrained(
"finetrainers/crush-smol-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
"THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
prompt = """
DIFF_crush A thick burger is placed on a dining table, and a large metal cylinder descends from above, crushing the burger as if it were under a hydraulic press. The bulb is crushed, leaving a pile of debris around it.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)
訓練日誌
訓練日誌可在WandB上查看:[點擊查看](https://wandb.ai/sayakpaul/finetrainers - cogvideox/runs/ngcsyhom)
LoRA
我們從微調後的檢查點中提取了一個秩為64的LoRA(提取腳本可查看 這裡)。這個LoRA 可用於模擬相同的效果:
代碼
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
import torch
pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("finetrainers/cakeify-v0", weight_name="extracted_crush_smol_lora_64.safetensors")
prompt = """
DIFF_crush A thick burger is placed on a dining table, and a large metal cylinder descends from above, crushing the burger as if it were under a hydraulic press. The bulb is crushed, leaving a pile of debris around it.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output_lora.mp4", fps=25)