Latte - 1 Open - Source Video Generation Model - Freely Support Multi - Dataset Pretraining for Text - to

Latte 1

Developed by maxin-cn

Latte is a Transformer-based latent diffusion model focused on text-to-video generation tasks, supporting pre-trained weights for multiple datasets.

Text-to-Video Open Source License:Apache-2.0 #Text-to-Video Generation #Latent Diffusion Transformer #Multimodal Generation

Downloads 1,027

Release Time : 6/3/2024

Model Overview

Latte is a latent diffusion model based on the Transformer architecture, primarily designed for text-to-video generation tasks. It supports generating high-quality video content from text input and provides pre-trained weights for various datasets.

Model Features

Text-to-Video Generation

Supports generating high-quality video content from text descriptions

Multi-Dataset Support

Provides pre-trained weights for multiple datasets including FaceForensics, SkyTimelapse, UCF101, and Taichi-HD

Transformer Architecture

Utilizes a Transformer-based latent diffusion model architecture

Text-to-Image Capability

The latest version Latte-1 also supports text-to-image generation

Model Capabilities

Text-to-Video Generation

Text-to-Image Generation

Use Cases

Video Creation

Creative Video Generation

Automatically generates creative video content based on text descriptions

Can produce high-quality video clips

Education

Educational Video Generation

Automatically generates demonstration videos based on teaching content

🚀 Latte: Latent Diffusion Transformer for Video Generation

This repository provides pre - trained weights for text - to - video generation based on our paper exploring latent diffusion models with transformers (Latte). It addresses the challenge of generating high - quality videos from text descriptions, offering a valuable resource for researchers and developers in the field of multimedia generation. You can find more visualizations on our project page. If you want to obtain pre - trained weights on FaceForensics, SkyTimelapse, UCF101, and Taichi - HD, please refer to here.

✨ Features

News

(🔥 New) May. 23, 2024. 💥 Latte - 1 for Text - to - video generation is released! You can download the pre - trained model here. Latte - 1 also supports Text - to - image generation. Please run bash sample/t2i.sh.
(🔥 New) Mar. 20, 2024. 💥 An updated LatteT2V model is coming soon, stay tuned!
(🔥 New) Feb. 24, 2024. 💥 We are very grateful that researchers and developers like our work. We will continue to update our LatteT2V model, hoping that our efforts can help the community develop. Our Latte discord channel is created for discussions. Coders are welcome to contribute.
(🔥 New) Jan. 9, 2024. 💥 An updated LatteT2V model initialized with the PixArt - α is released. The checkpoint can be found here.
(🔥 New) Oct. 31, 2023. 💥 The training and inference code is released. All checkpoints (including FaceForensics, SkyTimelapse, UCF101, and Taichi - HD) can be found here. In addition, the LatteT2V inference code is provided.

📚 Documentation

Contact Us

Yaohui Wang: wangyaohui@pjlab.org.cn Xin Ma: xin.ma1@monash.edu

Citation

If you find this work useful for your research, please consider citing it.

@article{ma2024latte,
  title={Latte: Latent Diffusion Transformer for Video Generation},
  author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Liu, Ziwei and Li, Yuan - Fang and Chen, Cunjian and Qiao, Yu},
  journal={arXiv preprint arXiv:2401.03048},
  year={2024}
}

Paper: https://huggingface.co/papers/2401.03048

Acknowledgments

Latte has been greatly inspired by the following amazing works and teams: DiT and PixArt - α. We thank all the contributors for open - sourcing.

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご