๐ Wan-Fun
๐ Welcome! This is a project that focuses on text - to - video generation, offering various models and functionalities for creating high - quality videos.


English | Simplified Chinese
๐ Quick Start
1. Cloud Usage: AliyunDSW/Docker
a. Through Alibaba Cloud DSW
DSW provides free GPU hours. Users can apply once, and the application is valid for 3 months after approval.
Alibaba Cloud offers free GPU hours on Freetier. Obtain them and use them in Alibaba Cloud PAI - DSW. You can start CogVideoX - Fun within 5 minutes.

b. Through ComfyUI
Our ComfyUI interface is as follows. For details, check ComfyUI README.

c. Through Docker
If you use Docker, make sure the graphics card driver and CUDA environment are correctly installed on your machine, and then execute the following commands in sequence:
# pull image
docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun
# enter image
docker run -it -p 7860:7860 --network host --gpus all --security-opt seccomp:unconfined --shm-size 200g mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun
# clone code
git clone https://github.com/aigc-apps/VideoX-Fun.git
# enter VideoX-Fun's dir
cd VideoX-Fun
# download weights
mkdir models/Diffusion_Transformer
mkdir models/Personalized_Model
# Please use the hugginface link or modelscope link to download the model.
# CogVideoX-Fun
# https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP
# https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP
# Wan
# https://huggingface.co/alibaba-pai/Wan2.1-Fun-V1.1-14B-InP
# https://modelscope.cn/models/PAI/Wan2.1-Fun-V1.1-14B-InP
2. Local Installation: Environment Check/Download/Installation
a. Environment Check
We have verified that this library can run in the following environments:
Details for Windows:
- Operating System: Windows 10
- Python: python3.10 & python3.11
- PyTorch: torch2.2.0
- CUDA: 11.8 & 12.1
- CUDNN: 8+
- GPU: Nvidia - 3060 12G & Nvidia - 3090 24G
Details for Linux:
- Operating System: Ubuntu 20.04, CentOS
- Python: python3.10 & python3.11
- PyTorch: torch2.2.0
- CUDA: 11.8 & 12.1
- CUDNN: 8+
- GPU: Nvidia - V100 16G & Nvidia - A10 24G & Nvidia - A100 40G & Nvidia - A100 80G
We need approximately 60GB of available disk space. Please check!
b. Weight Placement
It is recommended to place the weights in the specified paths:
Through ComfyUI:
Place the models in the weight folder ComfyUI/models/Fun_Models/
of ComfyUI:
๐ฆ ComfyUI/
โโโ ๐ models/
โ โโโ ๐ Fun_Models/
โ โโโ ๐ CogVideoX-Fun-V1.1-2b-InP/
โ โโโ ๐ CogVideoX-Fun-V1.1-5b-InP/
โ โโโ ๐ Wan2.1-Fun-V1.1-14B-InP
โ โโโ ๐ Wan2.1-Fun-V1.1-1.3B-InP/
Running your own Python file or UI interface:
๐ฆ models/
โโโ ๐ Diffusion_Transformer/
โ โโโ ๐ CogVideoX-Fun-V1.1-2b-InP/
โ โโโ ๐ CogVideoX-Fun-V1.1-5b-InP/
โ โโโ ๐ Wan2.1-Fun-V1.1-14B-InP
โ โโโ ๐ Wan2.1-Fun-V1.1-1.3B-InP/
โโโ ๐ Personalized_Model/
โ โโโ your trained trainformer model / your trained lora model (for UI load)
๐ฆ Model Address
V1.1
Name |
Storage Space |
Hugging Face |
Model Scope |
Description |
Wan2.1-Fun-V1.1-1.3B-InP |
19.0 GB |
๐คLink |
๐Link |
The text - to - video weights of Wan2.1-Fun-V1.1-1.3B, trained at multiple resolutions and supporting first - and last - frame prediction. |
Wan2.1-Fun-V1.1-14B-InP |
47.0 GB |
๐คLink |
๐Link |
The text - to - video weights of Wan2.1-Fun-V1.1-14B, trained at multiple resolutions and supporting first - and last - frame prediction. |
Wan2.1-Fun-V1.1-1.3B-Control |
19.0 GB |
๐คLink |
๐Link |
The video control weights of Wan2.1-Fun-V1.1-1.3B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., and supporting reference image + control conditions for control and trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction. |
Wan2.1-Fun-V1.1-14B-Control |
47.0 GB |
๐คLink |
๐Link |
The video control weights of Wan2.1-Fun-V1.1-14B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., and supporting reference image + control conditions for control and trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction. |
Wan2.1-Fun-V1.1-1.3B-Control-Camera |
19.0 GB |
๐คLink |
๐Link |
The camera lens control weights of Wan2.1-Fun-V1.1-1.3B. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction. |
Wan2.1-Fun-V1.1-14B-Control |
47.0 GB |
๐คLink |
๐Link |
The camera lens control weights of Wan2.1-Fun-V1.1-14B. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction. |
V1.0
Name |
Storage Space |
Hugging Face |
Model Scope |
Description |
Wan2.1-Fun-1.3B-InP |
19.0 GB |
๐คLink |
๐Link |
The text - to - video weights of Wan2.1-Fun-1.3B, trained at multiple resolutions and supporting first - and last - frame prediction. |
Wan2.1-Fun-14B-InP |
47.0 GB |
๐คLink |
๐Link |
The text - to - video weights of Wan2.1-Fun-14B, trained at multiple resolutions and supporting first - and last - frame prediction. |
Wan2.1-Fun-1.3B-Control |
19.0 GB |
๐คLink |
๐Link |
The video control weights of Wan2.1-Fun-1.3B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., and supporting trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction. |
Wan2.1-Fun-14B-Control |
47.0 GB |
๐คLink |
๐Link |
The video control weights of Wan2.1-Fun-14B, supporting different control conditions such as Canny, Depth, Pose, MLSD, etc., and supporting trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 81 frames at 16 frames per second, and supports multi - language prediction. |
๐ฅ Video Works
Wan2.1-Fun-V1.1-14B-InP && Wan2.1-Fun-V1.1-1.3B-InP
Wan2.1-Fun-V1.1-14B-Control && Wan2.1-Fun-V1.1-1.3B-Control
Generic Control Video + Reference Image:
Reference Image
|
Control Video
|
Wan2.1-Fun-V1.1-14B-Control
|
Wan2.1-Fun-V1.1-1.3B-Control
|
|
|
|
|
Generic Control Video (Canny, Pose, Depth, etc.) and Trajectory Control:
Wan2.1-Fun-V1.1-14B-Control-Camera && Wan2.1-Fun-V1.1-1.3B-Control-Camera
Pan Up
|
Pan Left
|
Pan Right
|
|
|
|
Pan Down
|
Pan Up + Pan Left
|
Pan Up + Pan Right
|
|
|
|
๐ป How to Use
1. Generation
a. Video Memory Saving Scheme
Since the parameters of Wan2.1 are very large, we need to consider a video memory saving scheme to adapt to consumer - grade graphics cards. We provide a GPU_memory_mode
for each prediction file, which can be selected from model_cpu_offload
, model_cpu_offload_and_qfloat8
, and sequential_cpu_offload
. This scheme is also applicable to the generation of CogVideoX - Fun.
model_cpu_offload
means the entire model will be transferred to the CPU after use, which can save some video memory.
model_cpu_offload_and_qfloat8
means the entire model will be transferred to the CPU after use, and the Transformer model is quantized to float8, which can save more video memory.
sequential_cpu_offload
means each layer of the model will be transferred to the CPU after use. It is slower but saves a large amount of video memory.
qfloat8
will partially reduce the performance of the model but can save more video memory. If the video memory is sufficient, it is recommended to use model_cpu_offload
.
b. Through ComfyUI
For details, check ComfyUI README.
c. Running Python Files
- Step 1: Download the corresponding [weights]
๐ License
The project uses the Apache - 2.0 license.