best_model_ViTB16_GPT2開源跨模態模型 - 為圖像免費生成自然語言描述

首頁

Best Model ViTB16 GPT2

由evlinzxxx開發

基於視覺變換器(ViT)和GPT-2的跨模態模型，能夠為輸入圖像生成自然語言描述

圖像生成文本

Transformers

支持多種語言#多語言圖像描述生成 #視覺變換器架構 #雙語評估優化

下載量 15

發布時間 : 5/19/2024

模型概述

該模型結合了ViT-B/16視覺編碼器和GPT-2文本解碼器，專門用於圖像到文本的生成任務，支持生成英語和印尼語的圖像描述

模型特點

跨模態理解

能夠將視覺信息轉換為自然語言描述，實現圖像到文本的轉換

多語言支持

支持生成英語和印度尼西亞語的圖像描述

預訓練架構

基於強大的ViT-B/16視覺編碼器和GPT-2文本解碼器構建

模型能力

圖像理解

多語言文本生成

視覺-語言對齊

場景描述

使用案例

輔助技術

視障人士輔助

為視障用戶生成圖像內容的語音描述

幫助視障用戶理解視覺內容

內容管理

自動圖像標註

為圖像庫自動生成描述性標籤

提高圖像檢索效率

🚀 圖像描述生成模型

本項目是一個圖像描述生成模型，可實現圖像到文本的轉換，適用於圖像描述生成等場景。支持印尼語和英語兩種語言，使用了BLEU和ROUGE等評估指標。

🚀 快速開始

依賴安裝

確保你已經安裝了所需的庫，你可以使用以下命令進行安裝：

pip install transformers torch pillow

代碼運行

以下是一個簡單的運行示例，展示瞭如何使用預訓練模型進行圖像描述生成：

from transformers import VisionEncoderDecoderModel, ViTImageProcessor, GPT2Tokenizer
import torch
from PIL import Image

# 加載預訓練模型、特徵提取器和分詞器
model = VisionEncoderDecoderModel.from_pretrained("evlinzxxx/best_model_ViTB16_GPT2")
feature_extractor = ViTImageProcessor.from_pretrained("evlinzxxx/best_model_ViTB16_GPT2")
tokenizer = GPT2Tokenizer.from_pretrained("evlinzxxx/best_model_ViTB16_GPT2")

# 設置設備
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def show_image_and_captions(url):
  # get the image and display it
  display(load_image(url))
  # get the captions on various models
  our_caption = get_caption(model, image_processor, tokenizer, url)
  # print the captions
  print(f"Our caption: {our_caption}")

# 調用函數進行圖像描述生成
show_image_and_captions("/content/drive/MyDrive/try/test_400/gl_16.jpg") # ['navigate around the obstacle ahead adjusting your route to bypass the parked car.']

✨ 主要特性

多語言支持：支持印尼語（id）和英語（en）兩種語言。
評估指標：使用BLEU和ROUGE等評估指標來評估模型性能。
模型架構：採用Vision Transformer（ViT-B/16）作為編碼器，結合GPT2作為解碼器。

📦 安裝指南

你可以使用以下命令安裝所需的依賴庫：

pip install transformers torch pillow

💻 使用示例

基礎用法

from transformers import VisionEncoderDecoderModel, ViTImageProcessor, GPT2Tokenizer
import torch
from PIL import Image

# 加載預訓練模型、特徵提取器和分詞器
model = VisionEncoderDecoderModel.from_pretrained("evlinzxxx/best_model_ViTB16_GPT2")
feature_extractor = ViTImageProcessor.from_pretrained("evlinzxxx/best_model_ViTB16_GPT2")
tokenizer = GPT2Tokenizer.from_pretrained("evlinzxxx/best_model_ViTB16_GPT2")

# 設置設備
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def show_image_and_captions(url):
  # get the image and display it
  display(load_image(url))
  # get the captions on various models
  our_caption = get_caption(model, image_processor, tokenizer, url)
  # print the captions
  print(f"Our caption: {our_caption}")

# 調用函數進行圖像描述生成
show_image_and_captions("/content/drive/MyDrive/try/test_400/gl_16.jpg") # ['navigate around the obstacle ahead adjusting your route to bypass the parked car.']