Qwen2.5-VL-7B-Instruct-gptqmodel-int8 Open-Source Vision-Language Model - Enabling Image-Text Interaction Applications

Qwen2.5 VL 7B Instruct Gptqmodel Int8

Developed by wanzhenchn

A vision-language model based on the Qwen2.5-VL-7B-Instruct model with GPTQ-INT8 quantization

Supports Multiple LanguagesOpen Source License:MIT #Visual Language Understanding #GPTQ-INT8 Quantization #Multimodal Reasoning

Downloads 101

Release Time : 4/9/2025

Model Overview

This is a vision-language model that has undergone GPTQ-INT8 quantization, supporting image understanding and text generation tasks, suitable for multimodal application scenarios.

Model Features

GPTQ-INT8 Quantization

Use the GPTQModel toolkit for INT8 quantization to reduce model resource requirements

Multimodal Support

Support the joint processing of images and text to achieve visual language understanding

Efficient Inference

The quantized model improves inference efficiency while maintaining performance

Model Capabilities

Image Understanding

Text Generation

Multimodal Reasoning

Image Description Generation

Use Cases

Content Generation

Image Description Generation

Generate descriptive text based on the input image

The example demonstrates the ability to generate captions from images

Multimodal Interaction

Visual Question Answering

Answer questions based on the image content

🚀 Qwen2.5-VL-7B-Instruct-gptqmodel-int8

This project offers a GPTQ-INT8 quantized version of Qwen2.5-VL-7B-Instruct, achieved using the GPTQModel toolkit. It aims to optimize the model's performance through quantization.

🚀 Quick Start

✨ Features

Quantization: Utilizes the GPTQ-INT8 quantization method to optimize the Qwen2.5-VL-7B-Instruct model.
Toolkit: Employs the GPTQModel toolkit for quantization operations.

📦 Installation

First, make sure you have Python 3.10.x or a higher version installed. Then, use the following command to install the necessary dependencies:

# Python 3.10.x or above
pip3 install -v "gptqmodel>=2.2.0" --no-build-isolation

💻 Usage Examples

Basic Usage

To perform quantization, use the following command:

python3 gptqmodel_quantize.py /path/to/Qwen2.5-VL-7B-Instruct/ /path/to/Qwen2.5-VL-7B-Instruct-gptqmodel-int8 8

Here is the detailed code of gptqmodel_quantize.py:

# gptqmodel_quantize.py

import fire
from datasets import load_dataset

from gptqmodel import GPTQModel, QuantizeConfig
from gptqmodel.models.definitions.base_qwen2_vl import BaseQwen2VLGPTQ
import os

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["PYTHONUTF8"]="1"

def format_qwen2_vl_dataset(image, assistant):
    return [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": "generate a caption for this image"},
            ],
        },
        {"role": "assistant", "content": assistant},
    ]


def prepare_dataset(format_func, n_sample: int = 20) -> list[list[dict]]:
    from datasets import load_dataset

    dataset = load_dataset(
        "laion/220k-GPT4Vision-captions-from-LIVIS", split=f"train[:{n_sample}]"
    )
    return [
        format_func(sample["url"], sample["caption"])
        for sample in dataset
    ]


def get_calib_dataset(model):
    if isinstance(model, BaseQwen2VLGPTQ):
        return prepare_dataset(format_qwen2_vl_dataset, n_sample=256)
    raise NotImplementedError(f"Unsupported MODEL: {model.__class__}")


def quantize(model_path: str,
             output_path: str,
             bit: int):
    quant_config = QuantizeConfig(bits=bit, group_size=128)

    model = GPTQModel.load(model_path, quant_config)
    calibration_dataset = get_calib_dataset(model)

    # increase `batch_size` to match gpu/vram specs to speed up quantization
    model.quantize(calibration_dataset, batch_size=8)

    model.save(output_path)

    # test post-quant inference
    model = GPTQModel.load(output_path)
    result = model.generate("Uncovering deep insights begins with")[0] # tokens
    print(model.tokenizer.decode(result)) # string output


if __name__ == "__main__":
    fire.Fire(quantize)

📄 License

This project is licensed under the MIT license.

Property	Details
License	MIT
Base Model	Qwen/Qwen2.5-VL-7B-Instruct
Pipeline Tag	image-text-to-text
Library Name	transformers
Tags	text-generation-inference

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご