模型概述
模型特點
模型能力
使用案例
🚀 VisualCloze:基於視覺上下文學習的通用圖像生成框架(使用 Diffusers 實現)
VisualCloze 是一個基於上下文學習的通用圖像生成框架,支持多種領域內任務,能通過上下文學習泛化到未見任務,還可將多個任務統一為一步,生成目標圖像和中間結果,甚至支持從目標圖像逆向工程出一組條件。
如果您覺得 VisualCloze 很有用,請考慮給 Github 倉庫 點個星 ⭐。感謝!
🚀 快速開始
如果您發現 VisualCloze 很有幫助,請考慮給 Github 倉庫 加星 ⭐。感謝!
📰 新聞
- [2025-5-15] 🤗🤗🤗 VisualCloze 已合併到 Diffusers 官方管道 中。
✨ 主要特性
這是一個基於上下文學習的通用圖像生成框架,具備以下特性:
- 支持各種領域內任務。
- 通過上下文學習泛化到 未見任務。
- 將多個任務統一為一步,同時生成目標圖像和中間結果。
- 支持從目標圖像逆向工程出一組條件。
🔥 示例請見 項目頁面。
📦 安裝指南
您可以安裝官方的 diffusers。
pip install git+https://github.com/huggingface/diffusers.git
💻 使用示例
基礎用法
深度圖轉圖像示例

Load in-context images (make sure the paths are correct and accessible)
image_paths = [ # in-context examples [ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2_depth-anything-v2_Large.jpg'), load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2.jpg'), ], # query with the target image [ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/79f2ee632f1be3ad64210a641c4e201b/79f2ee632f1be3ad64210a641c4e201b_depth-anything-v2_Large.jpg'), None, # No image needed for the query in this case ], ]
Task and content prompt
task_prompt = "Each row outlines a logical process, starting from [IMAGE1] gray-based depth map with detailed object contours, to achieve [IMAGE2] an image with flawless clarity." content_prompt = """A serene portrait of a young woman with long dark hair, wearing a beige dress with intricate gold embroidery, standing in a softly lit room. She holds a large bouquet of pale pink roses in a black box, positioned in the center of the frame. The background features a tall green plant to the left and a framed artwork on the wall to the right. A window on the left allows natural light to gently illuminate the scene. The woman gazes down at the bouquet with a calm expression. Soft natural lighting, warm color palette, high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""
Load the VisualClozePipeline
pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16) pipe.to("cuda")
Run the pipeline
image_result = pipe( task_prompt=task_prompt, content_prompt=content_prompt, image=image_paths, upsampling_width=1024, upsampling_height=1024, upsampling_strength=0.4, guidance_scale=30, num_inference_steps=30, max_sequence_length=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0][0]
Save the resulting image
image_result.save("visualcloze.png")
#### 虛擬試穿示例
<img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="虛擬試穿示例"/>
```python
import torch
from diffusers import VisualClozePipeline
from diffusers.utils import load_image
# Load in-context images (make sure the paths are correct and accessible)
# The images are from the VITON-HD dataset at https://github.com/shadow2496/VITON-HD
image_paths = [
# in-context examples
[
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00.jpg'),
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/03673_00.jpg'),
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00_tryon_catvton_0.jpg'),
],
# query with the target image
[
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00555_00.jpg'),
load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/12265_00.jpg'),
None
],
]
# Task and content prompt
task_prompt = "Each row shows a virtual try-on process that aims to put [IMAGE2] the clothing onto [IMAGE1] the person, producing [IMAGE3] the person wearing the new clothing."
content_prompt = None
# Load the VisualClozePipeline
pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
pipe.to("cuda")
# Run the pipeline
image_result = pipe(
task_prompt=task_prompt,
content_prompt=content_prompt,
image=image_paths,
upsampling_height=1632,
upsampling_width=1232,
upsampling_strength=0.3,
guidance_scale=30,
num_inference_steps=30,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0][0]
# Save the resulting image
image_result.save("visualcloze.png")
引用
如果您發現 VisualCloze 對您的研究和應用有用,請使用以下 BibTeX 進行引用:
@article{li2025visualcloze,
title={VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning},
author={Li, Zhong-Yu and Du, Ruoyi and Yan, Juncheng and Zhuo, Le and Li, Zhen and Gao, Peng and Ma, Zhanyu and Cheng, Ming-Ming},
journal={arXiv preprint arXiv:2504.07960},
year={2025}
}
📄 許可證
本項目採用 Apache-2.0 許可證。
屬性 | 詳情 |
---|---|
庫名稱 | diffusers |
數據集 | VisualCloze/Graph200K |
基礎模型 | black-forest-labs/FLUX.1-Fill-dev |
管道標籤 | 圖像到圖像 |
標籤 | 文本到圖像、圖像到圖像、flux、lora、上下文學習、通用圖像生成、AI 工具 |









