Nova-d48w1024-osp480 Open Source Video Generation Model - Generate and Edit Videos According to Text Prompts

Nova D48w1024 Osp480

Developed by BAAI

A non-quantized autoregressive text-to-video model developed by Beijing Academy of Artificial Intelligence, capable of generating and editing videos based on text prompts

Text-to-Video Open Source License:Apache-2.0 #Text-to-Video Generation #Autoregressive Diffusion Architecture #High-Resolution Video Generation

Downloads 314

Release Time : 12/17/2024

Model Overview

A model based on the non-quantized video autoregressive diffusion architecture (NOVA), utilizing a pre-trained text encoder (Phi-2) and video VAE tokenizer (OpenSoraPlanV1.2-VAE), capable of generating and editing videos based on text prompts

Model Features

High-Resolution Video Generation

Capable of generating videos at 768x480 pixel resolution

Text-Conditioned Generation

Generates and edits video content based on text prompts

Adjustable Parameters

Enhances video quality by adjusting parameters

Model Capabilities

Text-to-Video Generation

Video Editing

Single-Frame Image Generation

Use Cases

Research and Education

Generative Model Research

Used for research on generative model technologies

Educational Tool Development

Developing educational or creative tools

Creativity and Design

Artistic Creation

Used for artistic creation and design applications

🚀 NOVA (d48w1024-osp480) Model Card

This is a non - quantized autoregressive text - to - video generation model developed by BAAI. It can generate and modify videos based on text prompts, offering a powerful tool for video - related research and creative applications.

✨ Features

Developed by BAAI.
Non - quantized Autoregressive Text - to - Video Generation Model.
Model size: 645M.
Model precision: torch.float16 (FP16).
Model resolution: 768x480.
Utilizes a pretrained text encoder ([Phi - 2](https://huggingface.co/microsoft/phi - 2)) and one VAE video tokenizer ([OpenSoraPlanV1.2 - VAE](https://huggingface.co/LanguageBind/Open - Sora - Plan - v1.2.0)).
Licensed under Apache 2.0 License.
More information available at GitHub Repository.

📦 Installation

Using the 🤗's Diffusers library to run NOVA in a simple and efficient manner.

pip install diffusers transformers accelerate imageio[ffmpeg]
pip install git+ssh://git@github.com/baaivision/NOVA.git

💻 Usage Examples

Basic Usage

import torch
from diffnext.pipelines import NOVAPipeline
from diffnext.utils import export_to_image, export_to_video

model_id = "BAAI/nova-d48w1024-osp480"
model_args = {"torch_dtype": torch.float16, "trust_remote_code": True}
pipe = NOVAPipeline.from_pretrained(model_id, **model_args)
pipe = pipe.to("cuda")

prompt = "Many spotted jellyfish pulsating under water."

image = pipe(prompt, max_latent_length=1).frames[0, 0]
export_to_image(image, "jellyfish.jpg")

video = pipe(prompt, max_latent_length=9).frames[0]
export_to_video(video, "jellyfish.mp4", fps=12)

Advanced Usage

# Increase AR and diffusion steps for better video quality.
video = pipe(
  prompt,
  max_latent_length=9,
  num_inference_steps=128,  # default: 64
  num_diffusion_steps=100,  # default: 25
).frames[0]
export_to_video(video, "jellyfish_v2.mp4", fps=12)

📚 Documentation

Direct Use

The model is intended for research purposes only. Possible research areas and tasks include:

Research on generative models.
Applications in educational or creative tools.
Generation of artworks and use in design and other artistic processes.
Probing and understanding the limitations and biases of generative models.
Safe deployment of models which have the potential to generate harmful content.

Out-of-Scope Use

The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Misuse and Malicious Use

Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:

Mis - and disinformation.
Representations of egregious violence and gore.
Impersonating individuals without their consent.
Sexual content without consent of the people who might see it.
Sharing of copyrighted or licensed material in violation of its terms of use.
Intentionally promoting or propagating discriminatory content or harmful stereotypes.
Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use.
Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc.

Limitations and Bias

Limitations

The autoencoding part of the model is lossy.
The model cannot render complex legible text.
The model does not achieve perfect photorealism.
The fingers, etc. in general may not be generated properly.
The model was trained on a subset of the web datasets [LAION - 5B](https://laion.ai/blog/laion - 5b/) and [COYO - 700M](https://github.com/kakaobrain/coyo - dataset), which contains adult, violent and sexual content.

Bias

While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.

📄 License

This model is licensed under the Apache 2.0 License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご