VidToMe開源視頻編輯方案 - 零樣本操作，提升連貫性還省內存！

首頁

Vidtome

由jadechoghari開發

基於擴散模型的零樣本視頻編輯方案，通過合併視頻幀間的自注意力令牌提升時間連貫性並降低內存消耗。

文本生成視頻開源協議:MIT #零樣本視頻編輯 #跨幀令牌合併 #自注意力優化

下載量 15

發布時間 : 10/7/2024

模型概述

VidToMe是一種無需微調模型的視頻編輯技術，通過跨幀對齊和壓縮冗餘令牌實現和諧的視頻生成與編輯，確保畫面過渡流暢、輸出內容連貫。

模型特點

零樣本編輯

無需微調模型即可通過自然語言提示直接編輯視頻內容。

跨幀令牌合併

通過合併視頻幀間的自注意力令牌顯著提升時間連貫性。

內存優化

壓縮冗餘令牌降低內存消耗，適合處理長視頻和複雜場景。

模型能力

視頻風格轉換

基於提示詞的視頻編輯

時間連貫性優化

使用案例

內容創作

視頻風格轉換

通過自然語言提示將原始視頻轉換為不同風格（如摺紙風格）

保持原始內容結構的同時實現藝術風格轉換

影視製作

特效編輯

無需複雜後期處理即可添加/修改視頻中的元素

顯著降低專業視頻編輯的技術門檻

🚀 VidToMe：用於零樣本視頻編輯的視頻令牌合併

VidToMe 能夠讓你僅通過一個提示詞就可以立即編輯視頻！🎥 基於 Diffusers 實現的 VidToMe 是一個基於擴散模型的零樣本視頻編輯管道，它通過合併視頻幀間的自注意力令牌，增強了時間一致性並減少了內存使用。這種方法無需對模型進行微調，就能夠實現和諧的視頻生成與編輯。通過對齊和壓縮幀間的冗餘令牌，VidToMe 確保了視頻過渡平滑、輸出連貫，相較於傳統視頻編輯方法有顯著提升。該方法基於這篇論文實現。

🚀 快速開始

安裝依賴

確保你已經安裝了 diffusers 庫，若未安裝，可以使用以下命令進行安裝：

pip install diffusers

運行代碼

以下是使用 VidToMe 進行視頻編輯的示例代碼：

from diffusers import DiffusionPipeline

# 加載預訓練模型
pipeline = DiffusionPipeline.from_pretrained(
    "jadechoghari/VidToMe", 
    trust_remote_code=True, 
    custom_pipeline="jadechoghari/VidToMe", 
    sd_version="depth", 
    device="cuda", 
    float_precision="fp16"
)

# 設置反演和生成提示詞
inversion_prompt = "flamingos standing in the water near a tree."
generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."}

# 額外的控制和參數
control_type = "none"  # 無額外控制，若需要可使用 "depth"
negative_prompt = ""

# 運行視頻到圖像的編輯管道
generated_images = pipeline(
    video_path="path/to/video.mp4",            # 添加輸入視頻的路徑
    video_prompt=inversion_prompt,    # 反演提示詞
    edit_prompt=generation_prompt,    # 用於生成的編輯提示詞
    control_type=control_type         # 控制類型（例如，"none", "depth"）
)

注意：若需要更多控制，可以創建一個配置文件，並遵循 GitHub 倉庫中的說明進行操作。

✨ 主要特性

支持零樣本視頻編輯，內容創作者無需複雜操作即可快速編輯視頻。
可使用自然語言提示詞進行視頻轉換，降低使用門檻。
針對長序列或複雜序列的視頻生成進行了內存優化，提高處理效率。

💻 使用示例

基礎用法

from diffusers import DiffusionPipeline

# 加載預訓練模型
pipeline = DiffusionPipeline.from_pretrained(
    "jadechoghari/VidToMe", 
    trust_remote_code=True, 
    custom_pipeline="jadechoghari/VidToMe", 
    sd_version="depth", 
    device="cuda", 
    float_precision="fp16"
)

# 設置反演和生成提示詞
inversion_prompt = "flamingos standing in the water near a tree."
generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."}

# 額外的控制和參數
control_type = "none"  # 無額外控制，若需要可使用 "depth"
negative_prompt = ""

# 運行視頻到圖像的編輯管道
generated_images = pipeline(
    video_path="path/to/video.mp4",            # 添加輸入視頻的路徑
    video_prompt=inversion_prompt,    # 反演提示詞
    edit_prompt=generation_prompt,    # 用於生成的編輯提示詞
    control_type=control_type         # 控制類型（例如，"none", "depth"）
)

高級用法

# 高級用法可根據具體需求調整參數，例如使用不同的控制類型、提示詞等。
# 這裡以使用 "depth" 控制類型為例：
from diffusers import DiffusionPipeline

# 加載預訓練模型
pipeline = DiffusionPipeline.from_pretrained(
    "jadechoghari/VidToMe", 
    trust_remote_code=True, 
    custom_pipeline="jadechoghari/VidToMe", 
    sd_version="depth", 
    device="cuda", 
    float_precision="fp16"
)

# 設置反演和生成提示詞
inversion_prompt = "flamingos standing in the water near a tree."
generation_prompt = {"origami": "rainbow-colored origami flamingos standing in the water near a tree."}

# 額外的控制和參數
control_type = "depth"  # 使用 "depth" 控制類型
negative_prompt = ""

# 運行視頻到圖像的編輯管道
generated_images = pipeline(
    video_path="path/to/video.mp4",            # 添加輸入視頻的路徑
    video_prompt=inversion_prompt,    # 反演提示詞
    edit_prompt=generation_prompt,    # 用於生成的編輯提示詞
    control_type=control_type         # 控制類型（例如，"none", "depth"）
)