InstructCV開源文本到圖像模型 - 支持自然語言指令完成多樣視覺任務

首頁

Instructcv

由alaa-lab開發

InstructCV是一個基於指令調優的文本到圖像擴散模型，能夠通過自然語言指令執行各種計算機視覺任務。

圖像生成開源協議:MIT #圖像指令編輯 #視覺通才模型 #文本引導圖像處理

下載量 20

發布時間 : 7/2/2023

模型概述

InstructCV是一個視覺通才模型，通過指令調優的文本到圖像擴散技術，能夠理解和執行各種計算機視覺任務的自然語言指令。

模型特點

指令驅動的視覺處理

可以通過自然語言指令執行各種計算機視覺任務

多功能視覺通才

能夠處理多種不同類型的視覺任務，如圖像檢測、編輯等

基於擴散模型

利用先進的擴散模型技術實現高質量的圖像處理

模型能力

圖像檢測

圖像編輯

基於指令的圖像轉換

視覺任務執行

使用案例

計算機視覺

人物檢測

通過自然語言指令檢測圖像中的人物

生成包含檢測結果的圖像

圖像編輯

根據文本指令對圖像進行編輯和修改

生成編輯後的圖像

🚀 InstructCV：指令調優的文本到圖像擴散模型，成為視覺多面手

InstructCV 是一種經過指令調優的文本到圖像擴散模型，可作為視覺通用模型，在圖像到圖像等任務中表現出色，利用相關數據集進行訓練，具有廣泛的應用前景。

🚀 快速開始

要使用 InstructCV，目前需要使用 main 版本安裝 diffusers。該管道將在下一版本中正式可用。

📦 安裝指南

pip install diffusers accelerate safetensors transformers

💻 使用示例

基礎用法

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "yulu2/InstructCV"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None, variant="ema")
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

url = "put your url here"

def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image

image         = download_image(URL)
seed          = random.randint(0, 100000)
generator     = torch.manual_seed(seed)
width, height = image.size
factor        = 512 / max(width, height)
factor        = math.ceil(min(width, height) * factor / 64) * 64 / min(width, height)
width         = int((width * factor) // 64) * 64
height        = int((height * factor) // 64) * 64
image         = ImageOps.fit(image, (width, height), method=Image.Resampling.LANCZOS)

prompt        = "Detect the person."
images        = pipe(prompt, image=image, num_inference_steps=100, generator=generator).images[0]
images[0]