🚀 AnimateDiff: Transforming Text to Video with Stable Diffusion
AnimateDiff is a revolutionary approach that empowers users to generate videos using pre - existing Stable Diffusion text - to - image models. It inserts motion module layers into a frozen text - to - image model and trains it on video clips to extract a motion prior.
These motion modules are placed after the ResNet and Attention blocks in the Stable Diffusion UNet. Their main function is to introduce consistent motion across image frames. To facilitate the use of these modules, we introduce the concepts of a MotionAdapter and UNetMotionModel, which provide a convenient way to integrate these motion modules with existing Stable Diffusion models.
✨ Features
- Leverage Existing Models: Utilize pre - trained Stable Diffusion text - to - image models to create videos.
- Motion Modules: Insert motion modules into the UNet of Stable Diffusion to introduce coherent motion between frames.
- Convenient Integration: Use MotionAdapter and UNetMotionModel to easily combine motion modules with existing models.
💻 Usage Examples
Basic Usage
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
📚 Documentation
The following table shows an example output of the AnimateDiff model:
Prompt |
Output |
masterpiece, bestquality, sunset. |
 |
💡 Usage Tip
AnimateDiff tends to work better with finetuned Stable Diffusion models. If you plan on using a scheduler that can clip samples, make sure to disable it by setting clip_sample=False
in the scheduler as this can also have an adverse effect on generated samples.