InstructCV开源文本到图像模型 - 支持自然语言指令完成多样视觉任务

首页

Instructcv

由 alaa-lab 开发

InstructCV是一个基于指令调优的文本到图像扩散模型，能够通过自然语言指令执行各种计算机视觉任务。

图像生成开源协议:MIT #图像指令编辑 #视觉通才模型 #文本引导图像处理

下载量 20

发布时间 : 7/2/2023

模型简介

InstructCV是一个视觉通才模型，通过指令调优的文本到图像扩散技术，能够理解和执行各种计算机视觉任务的自然语言指令。

模型特点

指令驱动的视觉处理

可以通过自然语言指令执行各种计算机视觉任务

多功能视觉通才

能够处理多种不同类型的视觉任务，如图像检测、编辑等

基于扩散模型

利用先进的扩散模型技术实现高质量的图像处理

模型能力

图像检测

图像编辑

基于指令的图像转换

视觉任务执行

使用案例

计算机视觉

人物检测

通过自然语言指令检测图像中的人物

生成包含检测结果的图像

图像编辑

根据文本指令对图像进行编辑和修改

生成编辑后的图像

🚀 InstructCV：指令调优的文本到图像扩散模型，成为视觉多面手

InstructCV 是一种经过指令调优的文本到图像扩散模型，可作为视觉通用模型，在图像到图像等任务中表现出色，利用相关数据集进行训练，具有广泛的应用前景。

🚀 快速开始

要使用 InstructCV，目前需要使用 main 版本安装 diffusers。该管道将在下一版本中正式可用。

📦 安装指南

pip install diffusers accelerate safetensors transformers

💻 使用示例

基础用法

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "yulu2/InstructCV"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None, variant="ema")
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

url = "put your url here"

def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image

image         = download_image(URL)
seed          = random.randint(0, 100000)
generator     = torch.manual_seed(seed)
width, height = image.size
factor        = 512 / max(width, height)
factor        = math.ceil(min(width, height) * factor / 64) * 64 / min(width, height)
width         = int((width * factor) // 64) * 64
height        = int((height * factor) // 64) * 64
image         = ImageOps.fit(image, (width, height), method=Image.Resampling.LANCZOS)

prompt        = "Detect the person."
images        = pipe(prompt, image=image, num_inference_steps=100, generator=generator).images[0]
images[0]