LTX-Video-0.9.1-diffusers开源模型 - 支持文本、图像生成视频

首页

LTX Video 0.9.1 Diffusers

由 a-r-r-o-w 开发

基于Diffusers格式的LTX-Video模型，支持文本生成视频和图像生成视频功能

文本生成视频 #文本生成视频 #图像生成视频 #高帧率视频生成

下载量 3,951

发布时间 : 12/22/2024

模型简介

LTX-Video是一个文本到视频和图像到视频的生成模型，能够根据文本描述或输入图像生成高质量的视频内容。

模型特点

高质量视频生成

能够生成具有连贯动作和细节的高质量视频

双模式支持

同时支持文本生成视频和图像生成视频两种模式

精细控制

支持通过提示词和负向提示词对生成内容进行精细控制

参数可调

提供多种参数调整选项，如帧数、分辨率、推理步数等

模型能力

文本生成视频

图像生成视频

视频风格控制

视频内容编辑

使用案例

创意内容制作

短视频创作

根据文本描述自动生成创意短视频内容

快速生成可用于社交媒体的短视频

广告制作

基于产品图片生成动态广告视频

降低广告视频制作成本和时间

影视预制作

概念可视化

将剧本场景快速可视化

帮助导演和团队快速理解场景构思

🚀 非官方Diffusers格式的LTX-Video权重

本项目提供了https://huggingface.co/Lightricks/LTX-Video （版本0.9.1）的非官方Diffusers格式权重。该项目支持文本到视频以及图像到视频的转换功能，为视频生成提供了便捷的解决方案。

🚀 快速开始

环境准备

确保你已经安装了torch和diffusers库，并且拥有支持CUDA的GPU设备。

文本到视频

以下是一个使用文本生成视频的示例代码：

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained("a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
    decode_timestep=0.03,
    decode_noise_scale=0.025,
).frames[0]
export_to_video(video, "output.mp4", fps=24)

图像到视频

以下是一个使用图像生成视频的示例代码：

import torch
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = LTXImageToVideoPipeline.from_pretrained("a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/datasets/a-r-r-o-w/tiny-meme-dataset-captioned/resolve/main/images/8.png"
)
prompt = "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
    decode_timestep=0.03,
    decode_noise_scale=0.025,
).frames[0]
export_to_video(video, "output.mp4", fps=24)