Qwen2-VL-7B-Captioner-Relaxed开源模型 - 生成详细图像描述用于文本到图像数据集创建

首页

Qwen2 VL 7B Captioner Relaxed

由 Ertugrul 开发

基于Qwen2-VL-7B-Instruct的指令调优版本，专注于生成更详细的图像描述，优化用于文本到图像数据集创建。

图像生成文本

Transformers

英语开源协议:Apache-2.0 #多模态图像描述 #详细图像分析 #文本到图像优化

下载量 4,080

发布时间 : 9/23/2024

模型简介

这是一个多模态大语言模型，经过微调后能够提供更全面、细致的图像描述，特别适合用于生成与文本到图像模型兼容的标题格式。

模型特点

增强细节

生成更全面、更细致的图像描述

宽松限制

相比基础模型提供限制更少的图像描述

自然语言输出

使用自然语言描述图像中的不同主体及其位置

图像生成优化

生成与最先进文本到图像生成模型兼容的标题格式

模型能力

图像描述生成

多模态理解

自然语言处理

使用案例

数据生成

文本到图像数据集创建

为训练文本到图像生成模型创建高质量的数据集

生成与图像生成模型兼容的详细描述

内容理解

图像内容分析

对图像内容进行详细描述和分析

提供全面的图像内容理解

🚀 Qwen2-VL-7B-Captioner-Relaxed

Qwen2-VL-7B-Captioner-Relaxed 是一款先进的多模态大语言模型，它基于 Qwen2-VL-7B-Instruct 进行指令微调。该微调版本基于为文本到图像模型精心策划的数据集，能为给定图像提供更详细的描述。

🚀 快速开始

环境要求

如果你遇到 KeyError: 'qwen2_vl' 或 ImportError: cannot import name 'Qwen2VLForConditionalGeneration' from 'transformers' 等错误，请尝试从源代码安装最新版本的 transformers 库：

pip install git+https://github.com/huggingface/transformers

代码示例

from PIL import Image
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from transformers import BitsAndBytesConfig
import torch

model_id = "Ertugrul/Qwen2-VL-7B-Captioner-Relaxed"

model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

image = Image.open(r"PATH_TO_YOUR_IMAGE")

# you can resize the image here if it's not fitting to vram, or set model max sizes.
# image = image.resize((1024, 1024)) # like this

text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

inputs = processor(
    text=[text_prompt], images=[image], padding=True, return_tensors="pt"
)
inputs = inputs.to("cuda")

with torch.no_grad():
    with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
        output_ids  = model.generate(**inputs, max_new_tokens=384, do_sample=True, temperature=0.7, use_cache=True, top_k=50)


generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)[0]
print(output_text)