Wan2.1-Fun-V1.1-14B-Control Open-source Video Generation Model - Free Text-to-Multiresolution Video Conversion

Wan2.1 Fun V1.1 14B Control

Developed by alibaba-pai

A 1.3B parameter text-to-video model developed by Alibaba Cloud PAI team, supporting multi-resolution training and first-last frame prediction

Text-to-Video Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multi-resolution video generation #First-last frame prediction #Multi-language compatibility

Downloads 55

Release Time : 4/24/2025

Model Overview

This model is a diffusion-based (diffusers) text-to-video generation system capable of producing high-quality video content from text descriptions. It supports multiple output resolutions and features first-last frame prediction to enhance video coherence.

Model Features

Multi-resolution support

Supports video generation at various resolutions including 512/768/1024

First-last frame prediction

Special technique to enhance video coherence

Multi-language compatibility

Supports Chinese and English text input

Control condition support

Can generate videos with control conditions like Canny, Depth, Pose

Model Capabilities

Text-to-video generation

Multi-resolution video generation

Condition-controlled video generation

First-last frame prediction

Use Cases

Creative content generation

Short video creation

Automatically generate creative short videos from text descriptions

81 frames, 16FPS smooth video

Advertisement production

Quickly generate product showcase videos

Professional-grade videos in multiple resolutions

Education & entertainment

Educational video generation

Transform textbook content into vivid videos

Supports Chinese and English teaching materials

🚀 Wan-Fun

😊 Welcome! This project focuses on text - to - video generation, offering various models and functionalities for creating high - quality videos.

English | Simplified Chinese

🚀 Quick Start

1. Cloud Usage: AliyunDSW/Docker

a. Via Alibaba Cloud DSW

DSW provides free GPU usage time. Users can apply for it once, and the application is valid for 3 months.

Alibaba Cloud offers free GPU time on Freetier. Obtain it and use it in Alibaba Cloud PAI - DSW. You can start CogVideoX - Fun within 5 minutes.

b. Via ComfyUI

Our ComfyUI interface is as follows. For details, check ComfyUI README. workflow graph

c. Via Docker

If you use Docker, make sure the graphics card driver and CUDA environment are correctly installed on your machine, and then execute the following commands:

# pull image
docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun

# enter image
docker run -it -p 7860:7860 --network host --gpus all --security-opt seccomp:unconfined --shm-size 200g mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun

# clone code
git clone https://github.com/aigc-apps/VideoX-Fun.git

# enter VideoX-Fun's dir
cd VideoX-Fun

# download weights
mkdir models/Diffusion_Transformer
mkdir models/Personalized_Model

# Please use the hugginface link or modelscope link to download the model.
# CogVideoX-Fun
# https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP
# https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP

# Wan
# https://huggingface.co/alibaba-pai/Wan2.1-Fun-V1.1-14B-InP
# https://modelscope.cn/models/PAI/Wan2.1-Fun-V1.1-14B-InP

2. Local Installation: Environment Check/Download/Installation

a. Environment Check

We have verified that this library can be executed in the following environments:

Details for Windows:

Operating System: Windows 10
Python: python3.10 & python3.11
PyTorch: torch2.2.0
CUDA: 11.8 & 12.1
CUDNN: 8+
GPU: Nvidia - 3060 12G & Nvidia - 3090 24G

Details for Linux:

Operating System: Ubuntu 20.04, CentOS
Python: python3.10 & python3.11
PyTorch: torch2.2.0
CUDA: 11.8 & 12.1
CUDNN: 8+
GPU: Nvidia - V100 16G & Nvidia - A10 24G & Nvidia - A100 40G & Nvidia - A100 80G

You need approximately 60GB of available disk space. Please check!

b. Weight Placement

It is recommended to place the weights according to the specified paths:

Via ComfyUI: Place the models in the weight folder ComfyUI/models/Fun_Models/ of ComfyUI:

📦 ComfyUI/
├── 📂 models/
│   └── 📂 Fun_Models/
│       ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
│       ├── 📂 CogVideoX-Fun-V1.1-5b-InP/
│       ├── 📂 Wan2.1-Fun-V1.1-14B-InP
│       └── 📂 Wan2.1-Fun-V1.1-1.3B-InP/

Running your own Python files or UI interface:

📦 models/
├── 📂 Diffusion_Transformer/
│   ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
│   ├── 📂 CogVideoX-Fun-V1.1-5b-InP/
│   ├── 📂 Wan2.1-Fun-V1.1-14B-InP
│   └── 📂 Wan2.1-Fun-V1.1-1.3B-InP/
├── 📂 Personalized_Model/
│   └── your trained trainformer model / your trained lora model (for UI load)

✨ Features

Model Addresses

V1.1:

Name	Storage Space	Hugging Face	Model Scope	Description
Wan2.1 - Fun - V1.1 - 1.3B - InP	19.0 GB	🤗Link	😄Link	Weights for text - to - video generation of Wan2.1 - Fun - V1.1 - 1.3B, trained at multiple resolutions, supporting prediction of start and end frames.
Wan2.1 - Fun - V1.1 - 14B - InP	47.0 GB	🤗Link	😄Link	Weights for text - to - video generation of Wan2.1 - Fun - V1.1 - 14B, trained at multiple resolutions, supporting prediction of start and end frames.
Wan2.1 - Fun - V1.1 - 1.3B - Control	19.0 GB	🤗Link	😄Link	Video control weights of Wan2.1 - Fun - V1.1 - 1.3B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., supporting control with reference images + control conditions, and supporting trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction.
Wan2.1 - Fun - V1.1 - 14B - Control	47.0 GB	🤗Link	😄Link	Video control weights of Wan2.1 - Fun - V1.1 - 14B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., supporting control with reference images + control conditions, and supporting trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction.
Wan2.1 - Fun - V1.1 - 1.3B - Control - Camera	19.0 GB	🤗Link	😄Link	Camera lens control weights of Wan2.1 - Fun - V1.1 - 1.3B. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction.
Wan2.1 - Fun - V1.1 - 14B - Control	47.0 GB	🤗Link	😄Link	Camera lens control weights of Wan2.1 - Fun - V1.1 - 14B. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction.

V1.0:

Name	Storage Space	Hugging Face	Model Scope	Description
Wan2.1 - Fun - 1.3B - InP	19.0 GB	🤗Link	😄Link	Weights for text - to - video generation of Wan2.1 - Fun - 1.3B, trained at multiple resolutions, supporting prediction of start and end frames.
Wan2.1 - Fun - 14B - InP	47.0 GB	🤗Link	😄Link	Weights for text - to - video generation of Wan2.1 - Fun - 14B, trained at multiple resolutions, supporting prediction of start and end frames.
Wan2.1 - Fun - 1.3B - Control	19.0 GB	🤗Link	😄Link	Video control weights of Wan2.1 - Fun - 1.3B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., and supporting trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction.
Wan2.1 - Fun - 14B - Control	47.0 GB	🤗Link	😄Link	Video control weights of Wan2.1 - Fun - 14B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., and supporting trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction.

Video Works

Wan2.1 - Fun - V1.1 - 14B - InP && Wan2.1 - Fun - V1.1 - 1.3B - InP

Wan2.1 - Fun - V1.1 - 14B - Control && Wan2.1 - Fun - V1.1 - 1.3B - Control

Generic Control Video + Reference Image:

Reference Image	Control Video	Wan2.1 - Fun - V1.1 - 14B - Control	Wan2.1 - Fun - V1.1 - 1.3B - Control

Generic Control Video (Canny, Pose, Depth, etc.) and Trajectory Control:

Wan2.1 - Fun - V1.1 - 14B - Control - Camera && Wan2.1 - Fun - V1.1 - 1.3B - Control - Camera

Pan Up	Pan Left	Pan Right

Pan Down	Pan Up + Pan Left	Pan Up + Pan Right

💻 Usage Examples

1. Generation

a. Video Memory Saving Scheme

Since the parameters of Wan2.1 are very large, we need to consider a video memory saving scheme to adapt to consumer - grade graphics cards. We provide a GPU_memory_mode for each prediction file, which can be selected from model_cpu_offload, model_cpu_offload_and_qfloat8, and sequential_cpu_offload. This scheme is also applicable to the generation of CogVideoX - Fun.

model_cpu_offload means the entire model will be moved to the CPU after use, which can save some video memory.
model_cpu_offload_and_qfloat8 means the entire model will be moved to the CPU after use, and the transformer model is quantized to float8, which can save more video memory.
sequential_cpu_offload means each layer of the model will be moved to the CPU after use. It is slower but saves a large amount of video memory.

qfloat8 will partially reduce the performance of the model but save more video memory. If the video memory is sufficient, it is recommended to use model_cpu_offload.

b. Via ComfyUI

Check ComfyUI README for details.

c. Running Python Files

Step 1: Download the corresponding [weights]

📄 License

The project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご