Florence 2 Flux Large_分類| AIbase模型庫

首頁

Florence 2 Flux Large

由gokaygokay開發

基於Microsoft Florence-2-large的視覺語言模型，擅長圖像理解和文本生成任務

圖像生成文本

Transformers

支持多種語言開源協議:Apache-2.0 #圖像文本生成 #多模態理解 #高精度描述

下載量 14.96k

發布時間 : 8/25/2024

模型概述

這是一個基於Florence-2架構的多模態模型，能夠處理圖像和文本輸入，生成高質量的文本描述和回答。

模型特點

多模態理解

能夠同時處理圖像和文本輸入，理解視覺內容並生成相關文本

高質量描述生成

可以生成詳細準確的圖像描述

任務適應性強

通過任務提示(task prompt)可以適應不同的視覺語言任務

模型能力

圖像理解

文本生成

圖像描述生成

視覺問答

使用案例

內容理解與生成

圖像描述生成

為圖像生成詳細準確的文字描述

生成符合圖像內容的自然語言描述

視覺問答

回答關於圖像內容的自然語言問題

提供準確的相關回答

輔助工具

視覺內容分析

分析圖像內容並提取關鍵信息

結構化輸出圖像中的重要元素和關係

🚀 圖像文本轉文本模型

本項目是基於transformers庫的圖像文本轉文本模型，藉助microsoft/Florence-2-large基礎模型，可實現圖像描述等功能，在藝術領域有一定應用價值。

🚀 快速開始

本項目是一個圖像文本轉文本的模型，以下是使用該模型的快速開始步驟。

📦 安裝指南

在使用模型之前，需要安裝必要的依賴庫，可使用以下命令進行安裝：

pip install -q datasets flash_attn timm einops

💻 使用示例

基礎用法

以下代碼展示瞭如何加載模型、處理輸入並生成圖像描述：

from transformers import AutoModelForCausalLM, AutoProcessor, AutoConfig
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = AutoModelForCausalLM.from_pretrained("gokaygokay/Florence-2-Flux-Large", trust_remote_code=True).to(device).eval()
processor = AutoProcessor.from_pretrained("gokaygokay/Florence-2-Flux-Large", trust_remote_code=True)

# Function to run the model on an example
def run_example(task_prompt, text_input, image):
    prompt = task_prompt + text_input

    # Ensure the image is in RGB mode
    if image.mode != "RGB":
        image = image.convert("RGB")

    inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
    generated_ids = model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        max_new_tokens=1024,
        num_beams=3,
        repetition_penalty=1.10,
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))
    return parsed_answer

from PIL import Image
import requests
import copy

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)
answer = run_example("<DESCRIPTION>", "Describe this image in great detail.", image)

final_answer = answer["<DESCRIPTION>"]
print(final_answer)

📄 許可證

本項目採用Apache-2.0許可證。

🔍 模型信息

屬性	詳情
模型類型	圖像文本轉文本模型
基礎模型	microsoft/Florence-2-large
訓練數據集	kadirnar/fluxdev_controlnet_16k
庫名稱	transformers
任務標籤	image-text-to-text
相關標籤	art