Cakeify-v0 Open-Source Video Generation Model - Freely Transform Everyday Items into Cake Creative Videos

Cakeify V0

Developed by finetrainers

A video generation model fine-tuned on the cakeify-smol dataset based on THUDM/CogVideoX-5b, specializing in creative video generation that transforms everyday objects into cakes

Text-to-Video Open Source License:Other #Object-to-Cake Video Generation #Hyper-realistic Prop Transformation #Creative Visual Effects

Downloads 24

Release Time : 1/22/2025

Model Overview

This model can generate high-quality videos from text prompts, particularly excelling at the creative metamorphosis process of turning ordinary objects into hyper-realistic cakes

Model Features

Creative Object Transformation

Capable of transforming everyday objects (such as teacups, soap, etc.) into hyper-realistic cakes, showcasing the creative metamorphosis process

High-Quality Video Generation

Generates high-quality videos with a resolution of 768×512, featuring smooth 81-frame animations

LoRA Support

Provides a 64-rank LoRA variant to reduce computational resource requirements while maintaining performance

Model Capabilities

Text-to-video generation

Creative visual transformation

High-resolution video output

Specific style video generation

Use Cases

Creative Content Production

Object-to-Cake Video

Creative videos transforming everyday objects like teacups or soap into cakes

Generates videos showing the entire process: object cutting, revealing the cake inside, and final transformation

Advertising Creative Production

Creating eye-catching advertisement videos for food or creative products

Produces unexpected product showcase effects

Social Media Content

Short Video Creativity

Creating 3-5 second creative short videos for social media platforms

Generates highly interactive visual content

🚀 Cakeify - Fine-tuned CogVideoX-5b Model

This project is a fine-tuned version of the THUDM/CogVideoX-5b model on the finetrainers/cakeify-smol dataset. It offers a creative way to transform everyday objects into hyper - realistic prop cakes in video generation.

🚀 Quick Start

Model Information

Property	Details
Model Type	Fine - tuned version of THUDM/CogVideoX - 5b
Training Data	finetrainers/cakeify - smol
Library Name	diffusers
License	other

Instance Prompt

PIKA_CAKEIFY A red tea cup is placed on a wooden surface. Suddenly, a knife appears and slices through the cup, revealing a cake inside. The cake turns into a hyper - realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful.

Gallery

Examples

Input Text	Output Video
PIKA_CAKEIFY A blue soap is placed on a modern table. Suddenly, a knife appears and slices through the soap, revealing a cake inside. The soap turns into a hyper - realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful.	Video
PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper - realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary.	Video
PIKA_CAKEIFY A red tea cup is placed on a wooden surface. Suddenly, a knife appears and slices through the cup, revealing a cake inside. The cake turns into a hyper - realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful.	Video

Code Repository

Code: https://github.com/a-r-r-o-w/finetrainers

⚠️ Important Note

This is an experimental checkpoint and its poor generalization is well - known.

💻 Usage Examples

Basic Usage

from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline 
from diffusers.utils import export_to_video
import torch 

transformer = CogVideoXTransformer3DModel.from_pretrained(
    "finetrainers/cakeify-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
    "THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")

prompt = """
PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"

video = pipeline(
    prompt=prompt, 
    negative_prompt=negative_prompt, 
    num_frames=81, 
    height=512,
    width=768,
    num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)

Training Logs

Training logs are available on WandB here.

📚 Documentation

LoRA

We extracted a 64 - rank LoRA from the finetuned checkpoint (script here). This LoRA can be used to emulate the same kind of effect:

Code

from diffusers import DiffusionPipeline 
from diffusers.utils import export_to_video
import torch 

pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("finetrainers/cakeify-v0", weight_name="extracted_cakeify_lora_64.safetensors")

prompt = """
PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"

video = pipeline(
    prompt=prompt, 
    negative_prompt=negative_prompt, 
    num_frames=81, 
    height=512,
    width=768,
    num_inference_steps=50
).frames[0]
export_to_video(video, "output_lora.mp4", fps=25)