Visualclozepipeline 384

由VisualCloze開發

VisualCloze是一個基於視覺上下文學習的通用圖像生成框架，支持多種領域內任務和未見任務的泛化，通過單步生成同時輸出目標圖像與中間結果。

文本生成圖像開源協議:Apache-2.0 #視覺上下文學習 #多任務圖像生成 #零樣本泛化

下載量 294

發布時間 : 4/19/2025

模型概述

VisualCloze是一個創新的圖像生成框架，通過上下文學習實現多任務統一處理，能夠從目標圖像反推生成條件集，支持圖像到圖像的轉換任務。

模型特點

上下文學習

支持通過上下文示例學習新任務，無需額外訓練即可泛化至未見任務

多任務統一

將多種圖像生成任務統一為單步生成流程，可同時輸出目標圖像與中間結果

條件反推

支持從目標圖像反推生成條件集，增強了模型的靈活性和可控性

高分辨率生成

通過SDEdit技術實現高分辨率圖像的超採樣生成

模型能力

深度圖轉圖像

虛擬試衣

圖像修復

風格轉換

圖像超分辨率

使用案例

圖像生成與編輯

深度圖轉圖像

從物體的輪廓深度圖生成清晰完整的圖像

生成具有照片級真實感的圖像

虛擬試衣

將衣物虛擬穿戴到人物圖像上

生成逼真的試衣效果圖

創意設計

藝術創作

根據文本描述生成創意圖像

多樣化的藝術風格圖像

🚀 VisualCloze：基於視覺上下文學習的通用圖像生成框架（使用 Diffusers 實現）

VisualCloze 是一個基於上下文學習的通用圖像生成框架，支持多種領域內任務，能通過上下文學習泛化到未見任務，還可將多個任務統一為一步，生成目標圖像和中間結果，甚至支持從目標圖像逆向工程出一組條件。

[[論文](https://arxiv.org/abs/2504.07960)] [[項目頁面](https://visualcloze.github.io/)] [[Github](https://github.com/lzyhha/VisualCloze)]

[[🤗 Diffusers 實現](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/visualcloze)]

[[🤗 在線演示](https://huggingface.co/spaces/VisualCloze/VisualCloze)] [[🤗 數據集卡片](https://huggingface.co/datasets/VisualCloze/Graph200K)]

如果您覺得 VisualCloze 很有用，請考慮給 Github 倉庫 點個星 ⭐。感謝！

🚀 快速開始

如果您發現 VisualCloze 很有幫助，請考慮給 Github 倉庫 加星 ⭐。感謝！

📰 新聞

[2025-5-15] 🤗🤗🤗 VisualCloze 已合併到 Diffusers 官方管道 中。

✨ 主要特性

這是一個基於上下文學習的通用圖像生成框架，具備以下特性：

支持各種領域內任務。
通過上下文學習泛化到 未見任務。
將多個任務統一為一步，同時生成目標圖像和中間結果。
支持從目標圖像逆向工程出一組條件。

🔥 示例請見項目頁面。

📦 安裝指南

您可以安裝官方的 diffusers。

pip install git+https://github.com/huggingface/diffusers.git

💻 使用示例

基礎用法

深度圖轉圖像示例

```python import torch from diffusers import VisualClozePipeline from diffusers.utils import load_image

Load in-context images (make sure the paths are correct and accessible)

image_paths = [ # in-context examples [ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2_depth-anything-v2_Large.jpg'), load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2.jpg'), ], # query with the target image [ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/79f2ee632f1be3ad64210a641c4e201b/79f2ee632f1be3ad64210a641c4e201b_depth-anything-v2_Large.jpg'), None, # No image needed for the query in this case ], ]

Task and content prompt

task_prompt = "Each row outlines a logical process, starting from [IMAGE1] gray-based depth map with detailed object contours, to achieve [IMAGE2] an image with flawless clarity." content_prompt = """A serene portrait of a young woman with long dark hair, wearing a beige dress with intricate gold embroidery, standing in a softly lit room. She holds a large bouquet of pale pink roses in a black box, positioned in the center of the frame. The background features a tall green plant to the left and a framed artwork on the wall to the right. A window on the left allows natural light to gently illuminate the scene. The woman gazes down at the bouquet with a calm expression. Soft natural lighting, warm color palette, high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""

Load the VisualClozePipeline

pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16) pipe.to("cuda")

Run the pipeline

image_result = pipe( task_prompt=task_prompt, content_prompt=content_prompt, image=image_paths, upsampling_width=1024, upsampling_height=1024, upsampling_strength=0.4, guidance_scale=30, num_inference_steps=30, max_sequence_length=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0][0]

Save the resulting image

image_result.save("visualcloze.png")


#### 虛擬試穿示例
<img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="虛擬試穿示例"/>
```python
import torch
from diffusers import VisualClozePipeline
from diffusers.utils import load_image


# Load in-context images (make sure the paths are correct and accessible)
# The images are from the VITON-HD dataset at https://github.com/shadow2496/VITON-HD
image_paths = [
    # in-context examples
    [
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00.jpg'),
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/03673_00.jpg'),
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00_tryon_catvton_0.jpg'),
    ],
    # query with the target image
    [
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00555_00.jpg'),
        load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/12265_00.jpg'),
        None
    ],
]

# Task and content prompt
task_prompt = "Each row shows a virtual try-on process that aims to put [IMAGE2] the clothing onto [IMAGE1] the person, producing [IMAGE3] the person wearing the new clothing."
content_prompt = None

# Load the VisualClozePipeline
pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Run the pipeline
image_result = pipe(
    task_prompt=task_prompt,
    content_prompt=content_prompt,
    image=image_paths,
    upsampling_height=1632,
    upsampling_width=1232,
    upsampling_strength=0.3,
    guidance_scale=30,
    num_inference_steps=30,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0][0]

# Save the resulting image
image_result.save("visualcloze.png")

引用

如果您發現 VisualCloze 對您的研究和應用有用，請使用以下 BibTeX 進行引用：

@article{li2025visualcloze,
  title={VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning},
  author={Li, Zhong-Yu and Du, Ruoyi and Yan, Juncheng and Zhuo, Le and Li, Zhen and Gao, Peng and Ma, Zhanyu and Cheng, Ming-Ming},
  journal={arXiv preprint arXiv:2504.07960},
  year={2025}
}

📄 許可證

本項目採用 Apache-2.0 許可證。

屬性	詳情
庫名稱	diffusers
數據集	VisualCloze/Graph200K
基礎模型	black-forest-labs/FLUX.1-Fill-dev
管道標籤	圖像到圖像
標籤	文本到圖像、圖像到圖像、flux、lora、上下文學習、通用圖像生成、AI 工具