LTX-Video-0.9.1オープンソースビデオ生成モデル - リアルタイムで高品質、テキストからビデオ、画像からビデオの変換をサポート

ホーム

LTX Video 0.9.1

Lightricksによって開発

DiTアーキテクチャに基づくリアルタイム高品質動画生成モデルで、テキストから動画および画像から動画への2つの応用シナリオをサポート

テキスト生成ビデオ英語オープンソースライセンス:その他 #リアルタイム高画質動画生成 #DiTアーキテクチャ #1216×704解像度

ダウンロード数 64

リリース時間 : 3/16/2025

モデル概要

LTXビデオはDiTアーキテクチャに基づく初のリアルタイム高品質動画生成モデルで、1216×704解像度、30フレーム/秒の速度で高解像度かつ内容豊富なリアルな動画を生成可能

モデル特徴

リアルタイム高品質動画生成

1216×704解像度、30フレーム/秒の速度で動画を生成可能で、リアルタイム視聴よりも高速

多様なニーズに対応する複数バージョン

2B/13Bパラメータバージョン及び蒸留バージョンを提供し、品質と性能のニーズをバランス

デュアルモーダル入力サポート

テキストから動画(text-to-video)と画像から動画(image-to-video)の2つの生成方式を同時にサポート

高解像度出力

最大1216×704解像度の出力をサポートし、720×1280解像度及び257フレーム以下で最適なパフォーマンスを発揮

モデル能力

テキストから動画生成

画像から動画生成

高解像度動画合成

リアルタイム動画レンダリング

使用事例

映像制作

映像シーンプレビュー

脚本に基づく映像シーンのプレビューを迅速に生成

刑務所シーンや都市の街路など映画レベルの画面を例示

クリエイティブコンテンツ

動的ビジュアルコンテンツ制作

テキスト記述に基づくクリエイティブな短編動画を生成

自然景観や人物クローズアップなど多様なコンテンツを例示

🚀 LTX-Videoモデルカード

LTX-Videoは、DiTベースの初のビデオ生成モデルで、高品質なビデオをリアルタイムで生成できます。1216×704解像度で30FPSのビデオを、見る速度よりも速く生成します。多様なビデオの大規模データセットで学習されており、リアルで多様な内容の高解像度ビデオを生成します。このモデルカードは、LTX-Videoモデルに関連するモデルに焦点を当てており、コードベースはこちらで入手できます。

🚀 クイックスタート

LTX-Videoモデルをすぐに使い始めることができます。以下のセクションで、使用方法や注意事項を説明します。

✨ 主な機能

リアルタイムで高品質なビデオ生成が可能。
テキストからビデオ、画像+テキストからビデオの両方のユースケースに対応。
多様な解像度とフレーム数に対応。

📦 インストール

コードベースはPython 3.10.5、CUDAバージョン12.2でテストされており、PyTorch >= 2.1.2をサポートしています。

git clone https://github.com/Lightricks/LTX-Video.git
cd LTX-Video

# create env
python -m venv env
source env/bin/activate
python -m pip install -e .\[inference-script\]

💻 使用例

基本的な使用法

テキストからビデオの生成

python inference.py --prompt "PROMPT" --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config ltxv-13b-0.9.7-dev.yaml

画像からビデオの生成

python inference.py --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH --num_frames NUM_FRAMES --seed SEED --pipeline_config ltxv-13b-0.9.7-dev.yaml

高度な使用法

Diffusersライブラリを使用したテキストからビデオの生成

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained("Lightricks/LTX-Video", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
).frames[0]
export_to_video(video, "output.mp4", fps=24)

Diffusersライブラリを使用した画像からビデオの生成

import torch
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = LTXImageToVideoPipeline.from_pretrained("Lightricks/LTX-Video", torch_dtype=torch.bfloat16)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/datasets/a-r-r-o-w/tiny-meme-dataset-captioned/resolve/main/images/8.png"
)
prompt = "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
).frames[0]
export_to_video(video, "output.mp4", fps=24)

📚 ドキュメント

モデルの詳細

属性	详情
開発元	Lightricks
モデルタイプ	拡散ベースのテキストからビデオおよび画像からビデオの生成モデル
言語	英語

使用方法

直接使用

ライセンスの範囲内でモデルを使用できます。各バージョンのライセンスは以下のリンクから入手できます。

2Bバージョン0.9: ライセンス
2Bバージョン0.9.1: ライセンス
2Bバージョン0.9.5: ライセンス
2Bバージョン0.9.6-dev: ライセンス
2Bバージョン0.9.6-distilled: ライセンス
13Bバージョン0.9.7-dev: ライセンス
13Bバージョン0.9.7-dev-fp8: ライセンス
時間的アップスケーラーバージョン0.9.7: ライセンス
空間的アップスケーラーバージョン0.9.7: ライセンス

一般的なヒント

⚠️ 重要提示

モデルは、解像度が32で割り切れ、フレーム数が8 + 1で割り切れる（例: 257）もので動作します。解像度またはフレーム数が32または8 + 1で割り切れない場合、入力は-1でパディングされ、目的の解像度とフレーム数に切り取られます。

モデルは、720 x 1280以下の解像度と257以下のフレーム数で最適に動作します。

プロンプトは英語で記述する必要があります。詳細なほど良い結果が得られます。

💡 使用建议

良いプロンプトの例: The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim.

オンラインデモ

以下のリンクからモデルにアクセスできます。

ComfyUI

ComfyUIでモデルを使用するには、ComfyUIリポジトリの指示に従ってください。

モデルの種類

モデル	バージョン	注意事項	inference.pyの設定	ComfyUIワークフロー（推奨）
ltxv-13b	0.9.7	最高品質、より多くのVRAMが必要	ltxv-13b-0.9.7-dev.yaml	ltxv-13b-i2v-base.json
ltxv-13b-fp8	0.9.7	量子化モデル	近日公開	ltxv-13b-i2v-base-fp8.json
ltxv-2b	0.9.6	良い品質、ltxv-13bよりも低いVRAM要件	ltxv-2b-0.9.6-dev.yaml	ltxvideo-i2v.json
ltxv-2b-distilled	0.9.6	15倍高速、リアルタイム対応、必要なステップ数が少なく、STG/CFG不要	ltxv-2b-0.9.6-distilled.yaml	ltxvideo-i2v-distilled.json