Florence-2-Flux-Large开源视觉语言模型 - 免费实现图像理解与文本生成

首页

Florence 2 Flux Large

由 gokaygokay 开发

基于Microsoft Florence-2-large的视觉语言模型，擅长图像理解和文本生成任务

图像生成文本

Transformers

支持多种语言开源协议:Apache-2.0 #图像文本生成 #多模态理解 #高精度描述

下载量 14.96k

发布时间 : 8/25/2024

模型简介

这是一个基于Florence-2架构的多模态模型，能够处理图像和文本输入，生成高质量的文本描述和回答。

模型特点

多模态理解

能够同时处理图像和文本输入，理解视觉内容并生成相关文本

高质量描述生成

可以生成详细准确的图像描述

任务适应性强

通过任务提示(task prompt)可以适应不同的视觉语言任务

模型能力

图像理解

文本生成

图像描述生成

视觉问答

使用案例

内容理解与生成

图像描述生成

为图像生成详细准确的文字描述

生成符合图像内容的自然语言描述

视觉问答

回答关于图像内容的自然语言问题

提供准确的相关回答

辅助工具

视觉内容分析

分析图像内容并提取关键信息

结构化输出图像中的重要元素和关系

🚀 图像文本转文本模型

本项目是基于transformers库的图像文本转文本模型，借助microsoft/Florence-2-large基础模型，可实现图像描述等功能，在艺术领域有一定应用价值。

🚀 快速开始

本项目是一个图像文本转文本的模型，以下是使用该模型的快速开始步骤。

📦 安装指南

在使用模型之前，需要安装必要的依赖库，可使用以下命令进行安装：

pip install -q datasets flash_attn timm einops

💻 使用示例

基础用法

以下代码展示了如何加载模型、处理输入并生成图像描述：

from transformers import AutoModelForCausalLM, AutoProcessor, AutoConfig
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = AutoModelForCausalLM.from_pretrained("gokaygokay/Florence-2-Flux-Large", trust_remote_code=True).to(device).eval()
processor = AutoProcessor.from_pretrained("gokaygokay/Florence-2-Flux-Large", trust_remote_code=True)

# Function to run the model on an example
def run_example(task_prompt, text_input, image):
    prompt = task_prompt + text_input

    # Ensure the image is in RGB mode
    if image.mode != "RGB":
        image = image.convert("RGB")

    inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
    generated_ids = model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        max_new_tokens=1024,
        num_beams=3,
        repetition_penalty=1.10,
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))
    return parsed_answer

from PIL import Image
import requests
import copy

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)
answer = run_example("<DESCRIPTION>", "Describe this image in great detail.", image)

final_answer = answer["<DESCRIPTION>"]
print(final_answer)

📄 许可证

本项目采用Apache-2.0许可证。

🔍 模型信息

属性	详情
模型类型	图像文本转文本模型
基础模型	microsoft/Florence-2-large
训练数据集	kadirnar/fluxdev_controlnet_16k
库名称	transformers
任务标签	image-text-to-text
相关标签	art