Ruyi-Mini-7B Open-Source Image-to-Video Model - Freely Convert Images into 360p

Ruyi Mini 7B

Developed by IamCreateAI

Open-source image-to-video generation model supporting 360p to 720p resolution with up to 5-second video generation

Video Processing EnglishOpen Source License:Apache-2.0 #Image to Video #Dynamic Camera Control #Multi-Resolution Support

Downloads 437

Release Time : 12/16/2024

Model Overview

Ruyi-Mini-7B is an open-source image-to-video generation model capable of generating subsequent video frames from input images, supporting various aspect ratios and camera controls.

Model Features

Multi-Resolution Support

Supports video generation from 360p to 720p resolution, adaptable to various aspect ratios

Enhanced Motion Control

Provides camera movement control and motion amplitude adjustment features

Long Video Generation

Capable of generating continuous video frames up to 5 seconds long

High-Quality Output

Optimized through multi-stage training for higher quality video generation

Model Capabilities

Image to Video

Video Frame Prediction

Camera Motion Control

Multi-Resolution Generation

Use Cases

Content Creation

Static Image Animation

Convert static photos into dynamic videos

Generate smooth animation effects lasting 3-5 seconds

Short Video Production

Create short video content for social media

Quickly generate short video clips suitable for social platforms

Creative Design

Concept Visualization

Transform design concept images into dynamic presentations

Help clients better understand design intentions

🚀 Ruyi-Mini-7B

Ruyi-Mini-7B is an open - source image - to - video generation model developed by CreateAI. It can generate video frames from an input image, with resolutions from 360p to 720p, supporting various aspect ratios and a maximum duration of 5 seconds. The model offers greater flexibility and creativity in video generation with motion and camera control.

Hugging Face | Github

🚀 Quick Start

Ruyi-Mini-7B is an open - source image - to - video generation model. Starting with an input image, it can produce subsequent video frames at resolutions ranging from 360p to 720p, supporting various aspect ratios and a maximum duration of 5 seconds.

✨ Features

Generate video frames from an input image with resolutions from 360p to 720p.
Support various aspect ratios and a maximum video duration of 5 seconds.
Enhanced with motion and camera control for more flexible and creative video generation.

📦 Installation

Install code from github:

git clone https://github.com/IamCreateAI/Ruyi-Models
cd Ruyi-Models
pip install -r requirements.txt

💻 Usage Examples

Basic Usage

We provide two ways to run our model. The first is directly using python code.

python3 predict_i2v.py

Advanced Usage

Or use ComfyUI wrapper in our github repo.

📚 Documentation

Model Architecture

Ruyi-Mini-7B is an advanced image - to - video model with about 7.1 billion parameters. The model architecture is modified from [EasyAnimate V4 model](https://github.com/aigc - apps/EasyAnimate), whose transformer module is inherited from HunyuanDiT. It comprises three key components:

Casual VAE Module: Handles video compression and decompression. It reduces spatial resolution to 1/8 and temporal resolution to 1/4, with each latent pixel is represented in 16 float numbers after compression.
Diffusion Transformer Module: Generates compressed video data using 3D full attention, with:
- 2D Normalized - RoPE for spatial dimensions;
- Sin - cos position embedding for temporal dimensions;
- DDPM (Denoising Diffusion Probabilistic Models) for model training.
Ruyi also utilizes a CLIP model to extract the semantic features from the input image to guide the whole video generation. The CLIP features are introduced into the transformer by cross - attention.

Training Data and Methodology

The training process is divided into four phases:

Phase 1: Pre - training from scratch with ~200M video clips and ~30M images at a 256 - resolution, using a batch size of 4096 for 350,000 iterations to achieve full convergence.
Phase 2: Fine - tuning with ~60M video clips for multi - scale resolutions (384–512), with a batch size of 1024 for 60,000 iterations.
Phase 3: High - quality fine - tuning with ~20M video clips and ~8M images for 384–1024 resolutions, with dynamic batch sizes based on memory and 10,000 iterations.
Phase 4: Image - to - video training with ~10M curated high - quality video clips, with dynamic batch sizes based on memory for ~10,000 iterations.

Hardware Requirements

The VRAM cost of Ruyi depends on the resolution and duration of the video. Here we list the costs for some typical video size. Tested on single A100.

Property	Details
Model Type	Image - to - video generation model
Training Data	~200M video clips, ~30M images for pre - training; ~60M video clips for fine - tuning; ~20M video clips and ~8M images for high - quality fine - tuning; ~10M curated high - quality video clips for image - to - video training

Video Size	360x480x120	384x672x120	480x640x120	630x1120x120	720x1280x120
Memory	21.5GB	25.5GB	27.7GB	44.9GB	54.8GB
Time	03:10	05:29	06:49	24:18	39:02

For 24GB VRAM cards such as RTX4090, we provide low_gpu_memory_mode, under which the model can generate 720x1280x120 videos with a longer time.

Showcase

Image to Video Effects

Camera Control


input	left	right

static	up	down

Motion Amplitude Control

motion 1

motion 2

motion 3

motion 4

Limitations

There are some known limitations in this experimental release. Texts, hands and crowded human faces may be distorted. The video may cut to another scene when the model does not know how to generate future frames. We are still working on these problems and will update the model as we make progress.

Update

Dec 24, 2024: The diffusion model is updated to fix the black lines when creating 3:4 or 4:5 videos.
Dec 16, 2024: Ruyi - mini - 7B is released.

🔧 Technical Details

The model architecture and training details are described in the "Model Architecture" and "Training Data and Methodology" sections above.

📄 License

This model is released under the permissive Apache 2.0 license.

BibTeX

@misc{createai2024ruyi,
      title={Ruyi-Mini-7B},
      author={CreateAI Team},
      year={2024},
      publisher = {GitHub},
      journal = {GitHub repository},
      howpublished={\url{https://github.com/IamCreateAI/Ruyi-Models}}
}

Contact Us

You are welcomed to join our Discord or Wechat Group (Scan QR code to add Ruyi Assistant and join the official group) for further discussion!

wechat

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご