blip-large-long-cap開源圖像描述生成器 - 免費用於文生圖提示與數據集標註

首頁

Blip Large Long Cap

由unography開發

基於BLIP微調的長文本圖像描述生成器，適用於文生圖提示和圖像數據集標註

圖像生成文本

Transformers

開源協議:Bsd-3-clause #長文本圖像描述 #文生圖提示生成 #圖像數據集標註

下載量 26.87k

發布時間 : 4/16/2024

模型概述

該模型是基於BLIP架構微調的圖像描述生成模型，特別優化生成長文本描述，適用於文本到圖像生成提示和圖像數據集標註任務。

模型特點

長文本描述生成

特別優化生成長文本圖像描述，最大長度可達300個token

多場景適用

適用於各種場景的圖像描述生成，包括自然場景、人物活動等

條件式與非條件式生成

支持帶條件和不帶條件的圖像描述生成模式

模型能力

圖像轉文本

長文本描述生成

圖像內容分析

多場景圖像理解

使用案例

文本到圖像生成

AI繪畫提示生成

為文本到圖像生成系統提供詳細的描述性提示

生成可用於AI繪畫系統的詳細提示文本

圖像數據集標註

自動圖像標註

為圖像數據集生成詳細的描述性標註

減少人工標註工作量，提高數據集標註效率

🚀 LongCap：基於BLIP微調的圖像長描述生成模型

LongCap 模型基於 BLIP 微調，可生成圖像的長描述，適用於文生圖提示和文生圖數據集標註

🚀 快速開始

本模型可用於有條件和無條件的圖像描述生成。

✨ 主要特性

長描述生成：能夠生成圖像的長描述，為圖像提供豐富的文本信息。
廣泛適用性：適用於文生圖的提示生成以及文生圖數據集的標註。

📦 安裝指南

文檔未提及安裝步驟，可參考 Hugging Face 上相關模型的通用安裝方法。

💻 使用示例

基礎用法

你可以使用此模型進行有條件和無條件的圖像描述生成。

使用 PyTorch 模型

在 CPU 上運行模型

點擊展開

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

inputs = processor(raw_image, return_tensors="pt")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.

高級用法

在 GPU 上運行模型

全精度運行

點擊展開

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap").to("cuda")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

inputs = processor(raw_image, return_tensors="pt").to("cuda")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.

半精度 (`float16`) 運行

點擊展開

import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap", torch_dtype=torch.float16).to("cuda")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.