qhub-blip-image-captioning-finetuned開源模型 - 支持零售產品圖像視覺問答應用

首頁

Qhub Blip Image Captioning Finetuned

由quadranttechnologies開發

針對零售產品圖像視覺問答任務微調後的BLIP模型版本，基於在線零售平臺的圖像和產品描述標註的自定義數據集進行了微調。

圖像生成文本

Transformers

支持多種語言開源協議:Apache-2.0 #零售產品描述生成 #電商圖像理解 #元數據增強

下載量 369

發布時間 : 11/7/2024

模型概述

該模型用於零售行業中對產品圖像進行問答，支持產品元數據增強、人工生成產品描述的驗證等應用場景。

模型特點

零售場景優化

針對零售產品圖像進行了專門微調，能準確識別和描述各類商品

多模態理解

結合視覺和語言信息，實現圖像到文本的轉換

條件式生成

支持基於提示文本的條件式圖像描述生成

模型能力

圖像描述生成

產品識別

視覺問答

零售場景理解

使用案例

電子商務

產品元數據增強

自動為電商平臺上的產品圖像生成描述性文本

如準確識別並描述'凱膳怡專業立式攪拌機'等產品

產品描述驗證

驗證人工編寫的產品描述是否與圖像內容匹配

零售分析

貨架商品識別

識別零售貨架上的商品並生成描述

如準確識別'布什牌白豆罐頭'等商品

🚀 微調圖像描述模型

這是一個經過微調的BLIP模型，用於對零售產品圖像進行視覺問答。該模型在自定義數據集上進行了微調，這些數據集包含來自在線零售平臺的圖像，並配有產品描述註釋。

這個實驗性模型可用於回答零售行業產品圖像相關的問題。產品元數據豐富、驗證人工生成的產品描述等都是可能的應用場景。

🚀 快速開始

模型信息

屬性	詳情
模型類型	用於零售產品圖像視覺問答的微調版BLIP模型
訓練數據	phiyodr/coco2017以及來自在線零售平臺的自定義圖像數據集，並配有產品描述註釋
評估指標	BLEU
庫名稱	transformers
許可證	Apache-2.0

模型預測示例

輸入圖像	預測結果
	kitchenaid artisann stand mixer
	a bottle of milk sitting on a counter
	dove sensitive skin lotion
	bread bag with blue plastic handl
	bush ' s best white beans

💻 使用示例

基礎用法

import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

processor = BlipProcessor.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")
model = BlipForConditionalGeneration.from_pretrained("quadranttechnologies/qhub-blip-image-captioning-finetuned")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# conditional image captioning
text = "a photography of"
inputs = processor(raw_image, text, return_tensors="pt")

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))

# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt")

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))

📚 詳細文檔

BibTex和引用信息

@misc{https://doi.org/10.48550/arxiv.2201.12086,
  doi = {10.48550/ARXIV.2201.12086},
  
  url = {https://arxiv.org/abs/2201.12086},
  
  author = {Li, Junnan and Li, Dongxu and Xiong, Caiming and Hoi, Steven},
  
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {Creative Commons Attribution 4.0 International}
}