InstructCLIP-InstructPix2Pix開源圖像編輯模型 - 按文本指令輕鬆修改圖片

首頁

Instructclip InstructPix2Pix

由SherryXTChen開發

InstructCLIP是一種通過對比學習自動數據優化改進指令引導的圖像編輯模型，結合了CLIP和穩定擴散技術，能夠根據文本指令編輯圖像。

文本生成圖像英語開源協議:Apache-2.0 #指令引導圖像編輯 #對比學習優化 #多模態指令理解

下載量 450

發布時間 : 3/15/2025

模型概述

該模型基於穩定擴散架構，結合CLIP的對比學習能力，專注於通過文本指令引導的圖像編輯任務。

模型特點

指令引導編輯

能夠根據自然語言指令對圖像進行編輯和轉換

對比學習優化

利用CLIP的對比學習能力自動優化數據質量

穩定擴散基礎

基於穩定擴散架構，提供高質量的圖像生成能力

模型能力

文本到圖像轉換

圖像到圖像轉換

基於指令的圖像編輯

圖像風格轉換

使用案例

創意設計

藝術風格轉換

將普通照片轉換為3D雕塑或其他藝術風格

示例中展示了將照片轉換為3D雕塑的效果

內容創作

圖像內容修改

根據文本指令修改圖像中的特定元素

🚀 InstructCLIP：利用對比學習的自動數據優化改進指令引導的圖像編輯

本模型基於論文 Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning 構建。該模型藉助對比學習的自動數據優化技術，有效提升了指令引導的圖像編輯效果，為圖像編輯領域帶來了更高效、精準的解決方案。

GitHub 鏈接：https://github.com/SherryXTChen/Instruct-CLIP.git

🚀 快速開始

本模型基於論文 Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning 構建。你可以通過以下步驟快速使用該模型進行圖像編輯。

✨ 主要特性

多基礎模型支持：支持 timbrooks/instruct-pix2pix、SherryXTChen/Instruct-CLIP 和 SherryXTChen/LatentDiffusionDINOv2 等多個基礎模型。
特定數據集適配：使用 SherryXTChen/InstructCLIP-InstructPix2Pix-Data 數據集進行訓練，確保模型在指令引導圖像編輯任務上的性能。
多領域應用：適用於穩定擴散、文本到圖像、圖像到圖像等多個領域，具有廣泛的應用場景。

📦 安裝指南

文檔中未提及具體安裝步驟，可參考 diffusers 庫的官方文檔進行安裝。

💻 使用示例

基礎用法

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.load_lora_weights("SherryXTChen/InstructCLIP-InstructPix2Pix")
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

url = "https://raw.githubusercontent.com/SherryXTChen/Instruct-CLIP/refs/heads/main/assets/1_input.jpg"
def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image
image = download_image(url)

prompt = "as a 3 d sculpture"
images = pipe(prompt, image=image, num_inference_steps=20).images
images[0].save("output.jpg")

高級用法

文檔中未提及高級用法的代碼示例，可根據實際需求對基礎用法代碼進行擴展。

📚 詳細文檔

文檔中未提供詳細的使用說明和參數解釋，可參考 diffusers 庫的官方文檔獲取更多信息。

🔧 技術細節

文檔中未提供具體的技術實現細節，可參考論文 Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning 瞭解模型的技術原理。

📄 許可證

本項目採用 apache-2.0 許可證。

📋 模型信息

屬性	詳情
基礎模型	timbrooks/instruct-pix2pix、SherryXTChen/Instruct-CLIP、SherryXTChen/LatentDiffusionDINOv2
數據集	SherryXTChen/InstructCLIP-InstructPix2Pix-Data
語言	en
庫名稱	diffusers
許可證	apache-2.0
標籤	stable-diffusion、stable-diffusion-diffusers、text-to-image、diffusers、diffusers-training、image-to-image
推理	true
管道標籤	image-to-image