TransPixar Open-Source Text-to-Video Model - Freely Generate RGBA Videos with Alpha Channels

Transpixar

Developed by wileewang

TransPixar is a text-to-video generation model capable of producing RGBA videos with transparency (alpha channel)

Video Processing Open Source License:Apache-2.0 #Transparent Video Generation #RGBA Channels #Visual Effects

Downloads 95

Release Time : 12/31/2024

Model Overview

TransPixar extends pre-trained video models to enable RGBA generation while preserving original RGB capabilities, opening new possibilities for visual effects and interactive content creation

Model Features

RGBA Video Generation

Capable of generating videos with transparency channels for seamless integration of transparent elements with scenes

Dual-Channel Consistency

Achieves high-consistency joint generation of RGB and alpha channels through optimized attention mechanisms

Limited Data Training

Maintains advantages of original RGB models even with limited training data

Model Capabilities

Text-to-RGBA video generation

Image-to-RGBA video generation

Transparent effect video generation

Use Cases

Visual Effects

Smoke Effect Generation

Generates smoke effect videos with transparency channels

Enables natural blending with scenes

Reflection Effect Generation

Generates special effect videos with transparent reflections

Enhances scene realism

Content Creation

Interactive Content Production

Generates video materials with transparency channels for interactive applications

Facilitates post-production compositing and processing

🚀 TransPixar: Advancing Text-to-Video Generation with Transparency

This repository presents a model that aims to address the challenge of generating RGBA videos with transparency. It extends existing text-to-video technology, enabling more diverse and consistent video outputs, which are highly valuable for visual effects and interactive content creation.

Authors

Luozhou Wang*, Yijun Li**, Zhifei Chen, Jui-Hsien Wang, Zhifei Zhang, He Zhang, Zhe Lin, Yingcong Chen†

Affiliations: HKUST(GZ), HKUST, Adobe Research.

* Intership Project. ** Project Leader. † Corresponding Author.

Abstract

Text-to-video generative models have advanced significantly, finding applications in entertainment, advertising, and education. However, generating RGBA videos with alpha channels for transparency remains a challenge due to limited datasets and difficulties in adapting existing models. Alpha channels are essential for visual effects, allowing seamless integration of transparent elements. This paper introduces TransPixar, a method that extends pretrained video models for RGBA generation while maintaining RGB capabilities. It uses a diffusion transformer architecture, alpha-specific tokens, and LoRA-based fine-tuning to generate consistent RGB and alpha channels. By optimizing attention mechanisms, TransPixar preserves the strengths of the original RGB model and aligns RGB and alpha channels effectively, even with limited training data.

🚀 Quick Start

📰 News

[2024.01.07] We have released the project page, arXiv paper, inference code, and Huggingface demo for TransPixar + CogVideoX.

🚧 Todo List

[x] Release code, paper, and demo.
[x] Release checkpoints of joint generation (RGB + Alpha).

Installation
TransPixar LoRA Hub
Training
Inference
Acknowledgement
Citation

📦 Installation

conda create -n TransPixar python=3.10
conda activate TransPixar
pip install -r requirements.txt

✨ Features

TransPixar LoRA Hub

Our pipeline supports various video tasks, such as Text-to-RGBA Video and Image-to-RGBA Video. We offer pre-trained LoRA weights for different tasks:

Property	Details
Task	T2V + RGBA, I2V + RGBA
Base Model	genmo/mochi-1-preview, THUDM/CogVideoX-5B, THUDM/CogVideoX-5b-I2V
Frames	37, 49
LoRA weights	Coming soon, link, Coming soon

📚 Documentation

Training - RGB + Alpha Joint Generation

We have open-sourced the training code for Mochi on RGBA joint generation. For detailed information, please refer to the Mochi README.

Inference

Gradio Demo

In addition to the Hugging Face online demo, you can launch a local inference demo based on CogVideoX-5B by running the following command:

python app.py

Command Line Interface (CLI)

To generate RGBA videos, navigate to the corresponding directory for the video model and execute the following command:

python cli.py \
    --lora_path /path/to/lora \
    --prompt "..." \

🔗 Acknowledgement

finetrainers: We followed their implementation of Mochi training and inference.
CogVideoX: We followed their implementation of CogVideoX training and inference.

We appreciate their outstanding work and contributions to the open-source community.

📄 License

This project is licensed under the Apache-2.0 license.

📖 Citation

@misc{wang2025transpixar,
      title={TransPixar: Advancing Text-to-Video Generation with Transparency}, 
      author={Luozhou Wang and Yijun Li and Zhifei Chen and Jui-Hsien Wang and Zhifei Zhang and He Zhang and Zhe Lin and Yingcong Chen},
      year={2025},
      eprint={2501.03006},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.03006}, 
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご