AutoT2VPromptオープンソースの文章から動画を生成するプロンプト生成モデル - 少量の単語を入力するだけで完全なプロンプトを取得できます

ホーム

Autot2vprompt

WenhaoWangによって開発

Mistral-7Bアーキテクチャをファインチューニングしたテキストから動画へのプロンプト生成モデルで、少量の単語を入力するだけで完全なプロンプトを生成します

テキスト生成

Transformers

英語#テキストから動画へのプロンプト補完 #マルチプロンプト生成 #クリエイティブコンテンツ支援

ダウンロード数 26

リリース時間 : 4/4/2024

モデル概要

このモデルはテキストから動画へのタスク専用に設計されており、ユーザーが入力した少量のキーワードから完全な動画説明プロンプトを自動生成できます。様々なスタイルの動画プロンプトの生成をサポートしています。

モデル特徴

自動プロンプト補完

少量の単語を入力するだけで完全なテキストから動画へのプロンプトを生成

多様な出力

パラメータ調整により異なるスタイルのプロンプトを生成可能で、毎回複数の候補を生成可能

専門データセットでのトレーニング

VidProM専門テキストから動画へのプロンプトデータセットでファインチューニング

モデル能力

テキスト生成

プロンプト補完

マルチスタイル生成

使用事例

動画制作

ショート動画制作

ショート動画プラットフォーム向けに迅速にクリエイティブなプロンプトを生成

10種類の異なるスタイルの動画説明プロンプトを生成

映像制作

専門的な映像作品にクリエイティブなインスピレーションを提供

特定の時代やスタイルに合った動画説明を生成

🚀 自動テキストからビデオへのプロンプト生成モデル

このモデルは、数語を入力として与えると、完全なテキストからビデオへのプロンプトを生成することができます。

🚀 クイックスタート

このモデルは、VidProMデータセットを使用し、Mistral-7B-v0.1をベースに、8台のA100 GPUでファインチューニングされています。

📦 インストール

モデルのダウンロード

from transformers import pipeline
import torch
pipe = pipeline("text-generation", model="WenhaoWang/AutoT2VPrompt", model_kwargs={"torch_dtype": torch.bfloat16}, device_map="cuda:0")

パラメータの設定

input = "An underwater world"      # テキストからビデオへのプロンプトを生成するための入力テキスト。
max_length = 50                    # 生成されるテキストの最大長。
temperature = 1.2                  # 生成のランダム性を制御します。値が高いほど、よりランダムな出力になります。
top_k = 8                          # 各ステップで考慮される単語の数を、最も可能性の高い上位k個の単語に制限します。
num_return_sequences = 10          # 同じ入力から生成する異なるテキストからビデオへのプロンプトの数。

💻 使用例

基本的な使用法

all_prompts = pipe(input, max_length = max_length, do_sample = True, temperature = temperature, top_k = top_k, num_return_sequences=num_return_sequences)

def process(text):
    text = text.replace('\n', '.')
    text = text.replace('  .', '.')
    text = text[:text.rfind('.')]
    text = text + '.'
    return text

for i in range(num_return_sequences):
    print(process(all_prompts[i]['generated_text']))

これにより、10個のテキストからビデオへのプロンプトが生成され、好きなものを選ぶことができます。

An underwater world, 25 ye boy, with aqua-green eyes, dk sandy blond hair, from the back, and on his back a fish, 23 ye old, weing glasses,ctoon chacte.
An underwater world, the video should capture the essence of tranquility and the beauty of nature.. a woman with short hair weing a green dress sitting at the desk.
An underwater world, the ocean is full of discded items, the water flows, and the light penetrating through the water.
An underwater world.. a woman with red eyes and red lips  is looking forwd.
An underwater world.. an old man sitting in a chair, smoking a pipe, a little smoke coming out of the chair, a man is drinking a glass.
An underwater world. The ocean is filled with bioluminess as the water reflects a soft glow from a bioluminescent phosphorescent light source. The camera slowly moves away and zooms in..
An underwater world. the girl looks at the camera and smiles with happiness..
An underwater world, 1960s horror film..
An underwater world.. 4 men in 1940s style clothes walk ound a gothic castle. night, fe. A girl is running, and there e some flowers along the river.
An underwater world,  -camera pan up . A girl is playing with her cat on a sunny day in the pk. A man is running and then falling down and dying.

📄 ライセンス

このモデルは、CC BY - NC 4.0ライセンスの下でライセンスされています。

📚 ドキュメント

引用

@article{wang2024vidprom,
  title={VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models},
  author={Wang, Wenhao and Yang, Yi},
  journal={arXiv preprint arXiv:2403.06098},
  year={2024}
}