Model Selection

Text-to-Video Generation

# Text-to-Video Generation

Skyreels V2 T2V 14B 720P VACE GGUF

SkyReels-V2 is a 14B-parameter text-to-video generation model that supports 720P resolution output and includes VACE functionality.

Text-to-Video English

Wan2.1 VACE 1.3B GGUF

A direct GGUF conversion version of Wan2.1-VACE-1.3B, an open-source video foundation model compatible with consumer-grade GPUs, excelling in various video generation tasks.

Text-to-Video English

samuelchristlie

Moviigen1.1 VACE GGUF

This is an experimental GGUF conversion version of ZuluVision/MoviiGen1.1, integrated with the VACE plugin for text-to-video tasks.

Wan2.1 T2V 1.3B GGUF

Direct GGUF conversion version of Wan2.1-T2V-1.3B, suitable for text-to-video generation tasks on consumer-grade GPUs

Text-to-Video English

samuelchristlie

Wan2.1 VACE 14B GGUF

This is the GGUF quantized conversion version of the Wan-AI/Wan2.1-VACE-14B model, primarily designed for text-to-video generation tasks.

Moviigen1.1 GGUF

MoviiGen1.1 is a video generation model based on GGUF format conversion, supporting text-to-video tasks.

Video Processing

Ltxv 13b 0.9.7 Distilled GGUF

LTX-Video is a text-to-video generation model that supports creating video content from text or images.

Text-to-Video English

Wan2.1 T2V 14B CausVid GGUF

This is a GGUF format conversion version based on the Wan-AI/Wan2.1-T2V-14B model, primarily used for text-to-video generation tasks.

Text-to-Video English

Ltxv 13b 0.9.7 Dev GGUF

GGUF quantized version of the 13b-0.9.7-dev variant based on Lightricks/LTX-Video, supporting text-to-video and image-to-video generation tasks.

Text-to-Video English

GGUF quantized versions of the Lightricks/LTX-Video model, including development and distilled editions, designed for text-to-video generation tasks.

Text-to-Video English

Skyreels V2 T2V 14B 540P GGUF

SkyReels-V2 is a 14B-parameter text-to-video generation model that supports 540P resolution video generation.

Video Processing

Wan2.1 Fun 14B Control Gguf

A 14B-parameter multimodal model released by Alibaba PAI, supporting text-to-video generation tasks

Text-to-Video Supports Multiple Languages

Wan2.1 Fun 14B InP Gguf

A 14B-parameter multimodal model released by Alibaba PAI, supporting text-to-video generation tasks

Text-to-Video Supports Multiple Languages

This is a GGUF quantized version based on Wan-AI/Wan2.1-T2V-1.3B, specifically designed for text-to-video generation tasks, compatible with comfyui-gguf and gguf nodes.

Text-to-Video English

LTX-Video is a model based on text-to-video generation technology, capable of generating corresponding video content based on input text descriptions.

Text-to-Video English

The GGUF quantized version of Mochi is a text-to-video generation model that includes a GGUF encoder and GGUF variational autoencoder, suitable for fast video content generation.

Text-to-Video English

The GGUF quantized version of Wan Video is a text-to-video generation model suitable for older or low-end machines, supporting efficient inference via GGUF files.

Text-to-Video English

Wan2.1 T2V 14B Gguf

A text-to-video generation model converted to GGUF format, supporting usage via ComfyUI-GGUF custom nodes

Skyreels V1 Hunyuan I2V HFIE

SkyReels-V1-Hunyuan-I2V is a text-to-video generation model developed by Tencent SkyworkAI, based on the Hunyuan architecture, supporting video content generation from text input.

Text-to-Video English

AnimateLCM is a diffusion model-based text-to-video generation system capable of producing high-quality short video clips from text descriptions.

Mochi 1 Transformer 42

A distilled version of the genmoai mochi-1 model transformer, composed of 42 modules (original version has 48 modules), achieving lightweight through iterative removal of modules with the smallest MSE values

Text-to-Video English

Fasthunyuan Gguf

The GGUF quantized version of FastHunyuan, designed for text-to-video generation tasks, requires integration with ComfyUI

Hunyuanvideo HFIE

Tencent Hunyuan Video is a text-to-video generation model, compatible with Hugging Face inference endpoints.

Text-to-Video English

Mochi is a text-to-video generation model based on the GGUF quantized version, supporting video content generation from text descriptions.

Text-to-Video English

Tencent Hunyuan Community Edition's text-to-video model, capable of generating high-quality video content from text prompts.

Text-to-Video English

Nova D48w1024 Osp480

A non-quantized autoregressive text-to-video model developed by Beijing Academy of Artificial Intelligence, capable of generating and editing videos based on text prompts

Hunyuanvideo Gguf

GGUF quantized version of Tencent's Phantom Video model, designed specifically for ComfyUI for text-to-video generation tasks

FastHunyuan is the accelerated version of HunyuanVideo, requiring only 6 diffusion sampling steps to generate high-quality videos, achieving approximately an 8x speed improvement compared to the original version.

Hunyuan Video is a text-to-video generation model developed by Tencent.

Cogvideox 2B LiFT

CogVideoX-2B-LiFT is a text-to-video generation model fine-tuned from CogVideoX-1.5 using reward-weighted learning methods

Text-to-Video English

A video generation model based on CogVideoX-5b, capable of producing high-quality video content from text descriptions

Text-to-Video English

Zlikwidcogvideoxlora

This is a LoRA weight model trained for THUDM/CogVideoX-2b, focusing on the text-to-video generation task.

CogVideoX is the open-source version of the video generation model from Qingying. The 2B version is an entry-level model that balances compatibility with low operational and development costs.

Text-to-Video English

Vchitect 2.0 2B

Vchitect-2.0 is a parallel Transformer model for scaling video diffusion models, specializing in text-to-video and image-to-video generation tasks.

Video Processing

Animatediff Sparsectrl Scribble

AnimateDiff is a method that transforms static Stable Diffusion models into video generation models by inserting motion modules to achieve coherent video generation.

Animatediff Sparsectrl Rgb

AnimateDiff is a method that utilizes existing Stable Diffusion text-to-image models to create videos by inserting motion module layers to achieve coherent motion between frames.

Latte is a Transformer-based latent diffusion model focused on text-to-video generation tasks, supporting pre-trained weights for multiple datasets.

Text To Video Lvd Zs

A generative model combining large language models and video diffusion technology, supporting bounding box conditional control

Animatediff Motion Adapter Sdxl V1 0 Beta

AnimateDiff is a method that allows the use of existing Stable Diffusion text-to-image models to create videos.

Animatediff Motion Adapter V1 5 2

AnimateDiff is a method that enables the use of existing Stable Diffusion text-to-image models to create videos.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase