LTX-Video-0.9.1-diffusers開源模型 - 支持文本、圖像生成視頻

首頁

LTX Video 0.9.1 Diffusers

由a-r-r-o-w開發

基於Diffusers格式的LTX-Video模型，支持文本生成視頻和圖像生成視頻功能

文本生成視頻 #文本生成視頻 #圖像生成視頻 #高幀率視頻生成

下載量 3,951

發布時間 : 12/22/2024

模型概述

LTX-Video是一個文本到視頻和圖像到視頻的生成模型，能夠根據文本描述或輸入圖像生成高質量的視頻內容。

模型特點

高質量視頻生成

能夠生成具有連貫動作和細節的高質量視頻

雙模式支持

同時支持文本生成視頻和圖像生成視頻兩種模式

精細控制

支持通過提示詞和負向提示詞對生成內容進行精細控制

參數可調

提供多種參數調整選項，如幀數、分辨率、推理步數等

模型能力

文本生成視頻

圖像生成視頻

視頻風格控制

視頻內容編輯

使用案例

創意內容製作

短視頻創作

根據文本描述自動生成創意短視頻內容

快速生成可用於社交媒體的短視頻

廣告製作

基於產品圖片生成動態廣告視頻

降低廣告視頻製作成本和時間

影視預製作

概念可視化

將劇本場景快速可視化

幫助導演和團隊快速理解場景構思

🚀 非官方Diffusers格式的LTX-Video權重

本項目提供了https://huggingface.co/Lightricks/LTX-Video （版本0.9.1）的非官方Diffusers格式權重。該項目支持文本到視頻以及圖像到視頻的轉換功能，為視頻生成提供了便捷的解決方案。

🚀 快速開始

環境準備

確保你已經安裝了torch和diffusers庫，並且擁有支持CUDA的GPU設備。

文本到視頻

以下是一個使用文本生成視頻的示例代碼：

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained("a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
    decode_timestep=0.03,
    decode_noise_scale=0.025,
).frames[0]
export_to_video(video, "output.mp4", fps=24)

圖像到視頻

以下是一個使用圖像生成視頻的示例代碼：

import torch
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = LTXImageToVideoPipeline.from_pretrained("a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/datasets/a-r-r-o-w/tiny-meme-dataset-captioned/resolve/main/images/8.png"
)
prompt = "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
    decode_timestep=0.03,
    decode_noise_scale=0.025,
).frames[0]
export_to_video(video, "output.mp4", fps=24)