animatediff-motion-adapter-sdxl-v1-0-beta開源模型

首頁

Animatediff Motion Adapter Sdxl V1 0 Beta

由Warvito開發

AnimateDiff是一種方法，允許使用現有的Stable Diffusion文本生成圖像模型來創建視頻。

文本生成視頻 #文本生成視頻 #Stable Diffusion擴展 #動態模塊轉換

下載量 65

發布時間 : 3/15/2024

模型概述

AnimateDiff通過運動模塊擴展了Stable Diffusion模型的功能，使其能夠從文本生成視頻內容。

模型特點

視頻生成能力

將靜態圖像生成模型擴展為視頻生成模型

兼容性

可與現有Stable Diffusion模型配合使用

運動模塊

通過特殊設計的運動模塊實現幀間連貫性

模型能力

文本到視頻生成

視頻插值

動態內容創作

使用案例

創意內容製作

短視頻創作

根據文本描述自動生成短視頻內容

生成連貫的短視頻序列

動畫製作

簡化動畫製作流程

快速生成基礎動畫幀

教育

教學演示

將抽象概念可視化

生成動態教學素材

🚀 AnimateDiff項目

AnimateDiff是一種允許你使用現有的Stable Diffusion文本到圖像模型來創建視頻的方法。

🚀 快速開始

模型轉換

將 https://huggingface.co/guoyww/animatediff/blob/main/mm_sdxl_v10_beta.ckpt 轉換為Huggingface Diffusers格式，使用基於Diffuser的轉換腳本（可在 https://github.com/huggingface/diffusers/blob/main/scripts/convert_animatediff_motion_module_to_diffusers.py 找到）。

import argparse

import torch

from diffusers import MotionAdapter


def convert_motion_module(original_state_dict):
    converted_state_dict = {}
    for k, v in original_state_dict.items():
        if "pos_encoder" in k:
            continue

        else:
            converted_state_dict[
                k.replace(".norms.0", ".norm1")
                .replace(".norms.1", ".norm2")
                .replace(".ff_norm", ".norm3")
                .replace(".attention_blocks.0", ".attn1")
                .replace(".attention_blocks.1", ".attn2")
                .replace(".temporal_transformer", "")
            ] = v

    return converted_state_dict


def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--ckpt_path", type=str, required=True)
    parser.add_argument("--output_path", type=str, required=True)
    parser.add_argument("--use_motion_mid_block", action="store_true")
    parser.add_argument("--motion_max_seq_length", type=int, default=32)
    parser.add_argument("--save_fp16", action="store_true")

    return parser.parse_args()


if __name__ == "__main__":
    args = get_args()

    state_dict = torch.load(args.ckpt_path, map_location="cpu")
    if "state_dict" in state_dict.keys():
        state_dict = state_dict["state_dict"]

    conv_state_dict = convert_motion_module(state_dict)
    adapter = MotionAdapter(
        use_motion_mid_block=False,
        motion_max_seq_length=32,
        block_out_channels=(320, 640, 1280),
    )
    # skip loading position embeddings
    adapter.load_state_dict(conv_state_dict, strict=False)
    adapter.save_pretrained(args.output_path)

    if args.save_fp16:
        adapter.to(torch.float16).save_pretrained(args.output_path, variant="fp16")

使用示例

下面的示例展示瞭如何將運動模塊與現有的Stable Diffusion文本到圖像模型結合使用。

💻 使用示例

基礎用法

# 這裡的代碼是用於展示如何結合運動模塊與現有模型，保持原始代碼和註釋不變
# 以下代碼展示瞭如何將運動模塊與現有的Stable Diffusion文本到圖像模型結合使用
import argparse

import torch

from diffusers import MotionAdapter


def convert_motion_module(original_state_dict):
    converted_state_dict = {}
    for k, v in original_state_dict.items():
        if "pos_encoder" in k:
            continue

        else:
            converted_state_dict[
                k.replace(".norms.0", ".norm1")
                .replace(".norms.1", ".norm2")
                .replace(".ff_norm", ".norm3")
                .replace(".attention_blocks.0", ".attn1")
                .replace(".attention_blocks.1", ".attn2")
                .replace(".temporal_transformer", "")
            ] = v

    return converted_state_dict


def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--ckpt_path", type=str, required=True)
    parser.add_argument("--output_path", type=str, required=True)
    parser.add_argument("--use_motion_mid_block", action="store_true")
    parser.add_argument("--motion_max_seq_length", type=int, default=32)
    parser.add_argument("--save_fp16", action="store_true")

    return parser.parse_args()


if __name__ == "__main__":
    args = get_args()

    state_dict = torch.load(args.ckpt_path, map_location="cpu")
    if "state_dict" in state_dict.keys():
        state_dict = state_dict["state_dict"]

    conv_state_dict = convert_motion_module(state_dict)
    adapter = MotionAdapter(
        use_motion_mid_block=False,
        motion_max_seq_length=32,
        block_out_channels=(320, 640, 1280),
    )
    # skip loading position embeddings
    adapter.load_state_dict(conv_state_dict, strict=False)
    adapter.save_pretrained(args.output_path)

    if args.save_fp16:
        adapter.to(torch.float16).save_pretrained(args.output_path, variant="fp16")

高級用法

# 由於原文檔未提供高級用法說明，這裡暫時沒有額外的高級場景說明
# 可根據實際情況補充如何在更復雜場景下使用該模型和運動模塊

屬性	詳情
模型類型	text-to-video
訓練數據	未提及

精選推薦AI模型

Llama 3 Typhoon V1.5x 8b Instruct

專為泰語設計的80億參數指令模型，性能媲美GPT-3.5-turbo，優化了應用場景、檢索增強生成、受限生成和推理任務

Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型，專為邊緣設備推理設計，體積僅為Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基於RoBERTa架構的中文抽取式問答模型，適用於從給定文本中提取答案的任務。

智啟未來，您的人工智能解決方案智庫