VisualClozePipeline-384オープンソース画像生成フレームワーク

ホーム

Visualclozepipeline 384

VisualClozeによって開発

VisualClozeは視覚的コンテキスト学習に基づく汎用画像生成フレームワークで、複数のドメイン内タスクや未知タスクへの汎化をサポートし、単一ステップの生成で目標画像と中間結果を同時に出力します。

テキスト生成画像オープンソースライセンス:Apache-2.0 #視覚的コンテキスト学習 #マルチタスク画像生成 #ゼロショット汎化

ダウンロード数 294

リリース時間 : 4/19/2025

モデル概要

VisualClozeは革新的な画像生成フレームワークで、コンテキスト学習を通じて複数タスクを統一的に処理し、目標画像から生成条件セットを逆推論することができ、画像から画像への変換タスクをサポートします。

モデル特徴

コンテキスト学習

コンテキスト例を通じて新しいタスクを学習でき、追加のトレーニングなしで未知のタスクに汎化可能

マルチタスク統合

複数の画像生成タスクを単一の生成プロセスに統合し、目標画像と中間結果を同時に出力可能

条件逆推

目標画像から生成条件セットを逆推論することをサポートし、モデルの柔軟性と制御性を強化

高解像度生成

SDEdit技術により高解像度画像の超解像生成を実現

モデル能力

深度マップから画像生成

バーチャル試着

画像修復

スタイル変換

画像超解像度

使用事例

画像生成と編集

深度マップから画像生成

物体の輪郭深度マップから鮮明で完全な画像を生成

写真レベルのリアルな画像を生成

バーチャル試着

衣服を人物画像に仮想的に着せ替え

リアルな試着効果画像を生成

クリエイティブデザイン

アート創作

テキスト記述に基づいてクリエイティブな画像を生成

多様なアートスタイルの画像

🚀 VisualCloze: ビジュアルなインコンテキスト学習による汎用画像生成フレームワーク (Diffusers での実装)

VisualClozeは、ビジュアルなインコンテキスト学習を用いた汎用画像生成フレームワークです。様々なドメイン内タスクをサポートし、インコンテキスト学習を通じて未知のタスクにも対応できます。

[論文] [プロジェクトページ] [Github]

[🤗 Diffusers 実装]

[🤗 オンラインデモ] [🤗 データセットカード]

Examples

もしVisualClozeが役に立った場合は、Githubリポジトリにスターをつけることを検討してください。ありがとう！

🚀 クイックスタート

VisualClozeを始めるには、まず必要なライブラリをインストールする必要があります。以下の手順に従ってセットアップしましょう。

✨ 主な機能

インコンテキスト学習に基づく汎用画像生成フレームワークです。
様々なドメイン内タスクをサポートします。
インコンテキスト学習を通じて、未知のタスクにも汎化できます。
複数のタスクを1つのステップに統合し、ターゲット画像と中間結果の両方を生成します。
ターゲット画像から一連の条件を逆エンジニアリングすることができます。

🔥 具体的な例はプロジェクトページをご覧ください。

📦 インストール

公式の diffusers をインストールすることができます。

pip install git+https://github.com/huggingface/diffusers.git

💻 使用例

基本的な使用法

深度画像から画像への変換の例:

import torch
from diffusers import VisualClozePipeline
from diffusers.utils import load_image


# Load in-context images (make sure the paths are correct and accessible)
image_paths = [
    # in-context examples
    [
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2_depth-anything-v2_Large.jpg'),
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2.jpg'),
    ],
    # query with the target image
    [
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/79f2ee632f1be3ad64210a641c4e201b/79f2ee632f1be3ad64210a641c4e201b_depth-anything-v2_Large.jpg'),
        None,  # No image needed for the query in this case
    ],
]

# Task and content prompt
task_prompt = "Each row outlines a logical process, starting from [IMAGE1] gray-based depth map with detailed object contours, to achieve [IMAGE2] an image with flawless clarity."
content_prompt = """A serene portrait of a young woman with long dark hair, wearing a beige dress with intricate 
gold embroidery, standing in a softly lit room. She holds a large bouquet of pale pink roses in a black box, 
positioned in the center of the frame. The background features a tall green plant to the left and a framed artwork 
on the wall to the right. A window on the left allows natural light to gently illuminate the scene. 
The woman gazes down at the bouquet with a calm expression. Soft natural lighting, warm color palette, 
high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""

# Load the VisualClozePipeline
pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Run the pipeline
image_result = pipe(
    task_prompt=task_prompt,
    content_prompt=content_prompt,
    image=image_paths,
    upsampling_width=1024,
    upsampling_height=1024,
    upsampling_strength=0.4,
    guidance_scale=30,
    num_inference_steps=30,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0][0]

# Save the resulting image
image_result.save("visualcloze.png")

バーチャルトライオンの例:

import torch
from diffusers import VisualClozePipeline
from diffusers.utils import load_image


# Load in-context images (make sure the paths are correct and accessible)
# The images are from the VITON-HD dataset at https://github.com/shadow2496/VITON-HD
image_paths = [
    # in-context examples
    [
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00.jpg'),
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/03673_00.jpg'),
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00_tryon_catvton_0.jpg'),
    ],
    # query with the target image
    [
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00555_00.jpg'),
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/12265_00.jpg'),
        None
    ],
]

# Task and content prompt
task_prompt = "Each row shows a virtual try-on process that aims to put [IMAGE2] the clothing onto [IMAGE1] the person, producing [IMAGE3] the person wearing the new clothing."
content_prompt = None

# Load the VisualClozePipeline
pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Run the pipeline
image_result = pipe(
    task_prompt=task_prompt,
    content_prompt=content_prompt,
    image=image_paths,
    upsampling_height=1632,
    upsampling_width=1232,
    upsampling_strength=0.3,
    guidance_scale=30,
    num_inference_steps=30,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0][0]

# Save the resulting image
image_result.save("visualcloze.png")

📚 ドキュメント

引用

もしVisualClozeがあなたの研究やアプリケーションに役立った場合は、次のBibTeXを使用して引用してください。

@article{li2025visualcloze,
  title={VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning},
  author={Li, Zhong-Yu and Du, Ruoyi and Yan, Juncheng and Zhuo, Le and Li, Zhen and Gao, Peng and Ma, Zhanyu and Cheng, Ming-Ming},
  journal={arXiv preprint arXiv:2504.07960},
  year={2025}
}