🚀 LongCap:基於BLIP微調的圖像長描述生成模型
LongCap 模型基於 BLIP 微調,可生成圖像的長描述,適用於文生圖提示和文生圖數據集標註
🚀 快速開始
本模型可用於有條件和無條件的圖像描述生成。
✨ 主要特性
- 長描述生成:能夠生成圖像的長描述,為圖像提供豐富的文本信息。
- 廣泛適用性:適用於文生圖的提示生成以及文生圖數據集的標註。
📦 安裝指南
文檔未提及安裝步驟,可參考 Hugging Face 上相關模型的通用安裝方法。
💻 使用示例
基礎用法
你可以使用此模型進行有條件和無條件的圖像描述生成。
使用 PyTorch 模型
在 CPU 上運行模型
點擊展開
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.
高級用法
在 GPU 上運行模型
全精度運行
點擊展開
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap").to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda")
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.
半精度 (float16
) 運行
點擊展開
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
processor = BlipProcessor.from_pretrained("unography/blip-large-long-cap")
model = BlipForConditionalGeneration.from_pretrained("unography/blip-large-long-cap", torch_dtype=torch.float16).to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
pixel_values = inputs.pixel_values
out = model.generate(pixel_values=pixel_values, max_length=250)
print(processor.decode(out[0], skip_special_tokens=True))
>>> a woman sitting on the beach, wearing a checkered shirt and a dog collar. the woman is interacting with the dog, which is positioned towards the left side of the image. the setting is a beachfront with a calm sea and a golden hue.
📚 詳細文檔
模型信息
屬性 |
詳情 |
模型類型 |
基於 BLIP 微調的圖像描述生成模型 |
訓練數據 |
unography/laion-14k-GPT4V-LIVIS-Captions |
推理參數 |
最大長度:300 |
示例圖片
📄 許可證
本模型使用 BSD 3 - 條款許可證。