🚀 Diffusers - Text-to-Video
AnimateDiff is a groundbreaking method that empowers users to generate videos using pre - existing Stable Diffusion Text - to - Image models. It inserts motion module layers into a frozen text - to - image model and trains it on video clips to extract a motion prior. These motion modules are placed after the ResNet and Attention blocks in the Stable Diffusion UNet, aiming to introduce coherent motion across image frames. To support these modules, the concepts of a MotionAdapter and UNetMotionModel are introduced, providing a convenient way to integrate these motion modules with existing Stable Diffusion models.
✨ Features
- Leverage Existing Models: Utilize pre - trained Stable Diffusion text - to - image models to generate videos.
- Motion Modules: Insert motion module layers to introduce coherent motion across frames.
- Convenient Integration: Use MotionAdapter and UNetMotionModel to easily integrate motion modules with existing models.
📦 Installation
The installation steps are not provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, EulerAncestralDiscreteScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-3")
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = EulerAncestralDiscreteScheduler.from_pretrained(
model_id,
subfolder="scheduler",
beta_schedule="linear",
)
pipe.scheduler = scheduler
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
📚 Documentation
AnimateDiff allows you to generate videos using pre - existing Stable Diffusion text - to - image models. The following is a simple example of using motion modules with an existing Stable Diffusion text - to - image model:
First, load the motion adapter and the Stable Diffusion text - to - image model. Then, set up the scheduler and enable memory - saving features. Finally, generate the video frames based on the prompt and save them as a GIF.
🔧 Technical Details
AnimateDiff achieves video generation by inserting motion module layers into a frozen text - to - image model. These motion modules are placed after the ResNet and Attention blocks in the Stable Diffusion UNet. By training on video clips, a motion prior is extracted, which helps introduce coherent motion across image frames. The MotionAdapter and UNetMotionModel are introduced to conveniently use these motion modules with existing Stable Diffusion models.
Visual Example
masterpiece, bestquality, sunset.
|