đ 3DGS Dissolve Fine-tuned CogVideoX-5b
This project is a fine - tuned version of the THUDM/CogVideoX - 5b model on the finetrainers/3dgs - dissolve dataset. It offers a unique text - to - video generation experience with a distinct 3D dissolve effect.
đ Quick Start
Prerequisites
- Ensure you have the necessary dependencies installed, including
diffusers
, torch
, etc.
Inference
from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline
from diffusers.utils import export_to_video
import torch
transformer = CogVideoXTransformer3DModel.from_pretrained(
"finetrainers/3dgs-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
"THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
prompt = """
3D_dissolve In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)
⨠Features
- Fine - tuned Model: Based on the THUDM/CogVideoX - 5b model, fine - tuned on the finetrainers/3dgs - dissolve dataset.
- LoRA Variant: A LoRA variant of the parameters is provided, which can be used to achieve similar effects.
- Text - to - Video Generation: Capable of generating videos from text prompts with a unique 3D dissolve effect.
đĻ Installation
No specific installation steps are provided in the original document. However, you need to install the necessary libraries such as diffusers
and torch
to run the inference code.
đģ Usage Examples
Basic Usage
The above inference code demonstrates the basic usage of generating a video from a text prompt.
Advanced Usage (LoRA)
We extracted a 64 - rank LoRA from the finetuned checkpoint. You can use this LoRA to emulate the same kind of effect.
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
import torch
pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-crush/extracted_crush_smol_lora_64.safetensors", adapter_name="crush")
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-3dgs/extracted_3dgs_lora_64.safetensors", adapter_name="3dgs")
pipeline
prompts = ["""
In a 3D appearance, a small bicycle is seen surrounded by a burst of fiery sparks, creating a dramatic and intense visual effect against the dark background.
The video showcases a dynamic explosion of fiery particles in a 3D appearance, with sparks and embers scattering across the screen against a stark black background.
""",
"""
In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.
""",
]
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs, bad physique"
id_token = "3D_dissolve"
for i, prompt in enumerate(prompts):
video = pipeline(
prompt=f"{id_token} {prompt}",
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50,
generator=torch.manual_seed(0)
).frames[0]
export_to_video(video, f"output_{i}.mp4", fps=25)
đ Documentation
Model Information
Property |
Details |
Model Type |
Fine - tuned version of THUDM/CogVideoX - 5b |
Training Data |
finetrainers/3dgs - dissolve |
Library Name |
diffusers |
Widgets
- Prompt 1: 3D_dissolve A small tiger character in a colorful winter outfit appears in a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl around the penguin, creating a dramatic effect as they gradually evaporate into a burst of red sparks, leaving behind a stark black background.
- Prompt 2: 3D_dissolve A small car, rendered in a 3D appearance, navigates through a swirling vortex of fiery particles. As it moves forward, the surrounding environment transforms into a dynamic display of red sparks that eventually evaporate into a burst of red sparks, creating a mesmerizing visual effect against the dark backdrop.
Tags
- text - to - video
- diffusers - training
- diffusers
- cogvideox
- cogvideox - diffusers
- template:sd - lora
Training Logs
The training logs are available on WandB here.
đ License
This project is under the license.
â ī¸ Important Note
This is an experimental checkpoint and its poor generalization is well - known.
Code repository: https://github.com/a-r-r-o-w/finetrainers