Qwen2.5-VL-3B-Instruct-GPTQ-Int4 Open-Source Model - Free Deployment to Solve MultimediaTasks from Images and Texts to Text

Qwen2.5 VL 3B Instruct GPTQ Int4

Developed by hfl

This is the GPTQ-Int4 quantized version of the Qwen2.5-VL-3B-Instruct model, suitable for multimodal tasks involving image-to-text and text-to-text, supporting both Chinese and English.

Image-to-Text

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multimodal Understanding #GPTQ Quantization #Image-Text QA

Downloads 1,312

Release Time : 2/24/2025

Model Overview

This model is a GPTQ-Int4 quantized version based on Qwen2.5-VL-3B-Instruct, primarily designed for multimodal tasks involving images and text, capable of generating text descriptions related to images or answering relevant questions.

Model Features

Efficient Quantization

Utilizing GPTQ-Int4 quantization technology, it significantly reduces disk space and VRAM requirements while maintaining high performance.

Multimodal Support

Capable of processing both image and text inputs to generate relevant text outputs.

High Performance

Demonstrates excellent performance on benchmarks like ChartQA and OCRBench, approaching the performance of the original model.

Model Capabilities

Image Caption Generation

Image Question Answering

Multimodal Text Generation

Use Cases

Image Understanding

Image Description

Generate detailed descriptions of input images.

Example output: This image shows a bilingual sign in Chinese and English, displaying 'Chinese LLaMA & Alpaca Large Model' and 'Chinese LLaMA & Alpaca Large Language Models'.

Image QA

Answer questions related to image content.

🚀 Qwen2.5-VL-3B-Instruct-GPTQ-Int4

This is an unofficial GPTQ-Int4 quantized version of the Qwen2.5-VL model, which uses the gptqmodel library. It is compatible with the latest transformers library for running non - quantized Qwen2.5 - VL models.

✨ Features

This is an UNOFFICIAL GPTQ - Int4 quantized version of the Qwen2.5 - VL model, leveraging the gptqmodel library.
Compatible with the latest transformers library, capable of running non - quantized Qwen2.5 - VL models.

📦 Installation

Install the required libraries:

pip install git+https://github.com/huggingface/transformers accelerate qwen-vl-utils
pip install git+https://github.com/huggingface/optimum.git
pip install gptqmodel

Optionally, you may need to install:

pip install tokenicer device_smi logbar

💻 Usage Examples

Basic Usage

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4", 
    attn_implementation="flash_attention_2",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": "https://raw.githubusercontent.com/ymcui/Chinese-LLaMA-Alpaca-3/refs/heads/main/pics/banner.png"},
        {"type": "text", "text": "请你描述一下这张图片。"},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text], images=image_inputs, videos=video_inputs,
    padding=True, return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text[0])

Response

This image shows a logo in both Chinese and English, reading "中文LLaMA & Alpaca大模型" and "Chinese LLaMA & Alpaca Large Language Models". On the left side of the logo, there are two cartoon characters, one is an alpaca with a red scarf, and the other is an alpaca with white fur. The background is a green grassland and a building with a red roof. On the right side of the logo, there is a number 3, accompanied by some circuit patterns. The overall design is simple and clear, using bright colors and cute cartoon characters to attract attention.

📚 Documentation

Performance

Property	Details
Model Type	Qwen2.5-VL-3B-Instruct, Qwen2.5-VL-3B-Instruct-AWQ, Qwen2.5-VL-3B-Instruct-GPTQ-Int4, Qwen2.5-VL-3B-Instruct-GPTQ-Int3, Qwen2.5-VL-7B-Instruct, Qwen2.5-VL-7B-Instruct-AWQ, Qwen2.5-VL-7B-Instruct-GPTQ-Int4, Qwen2.5-VL-7B-Instruct-GPTQ-Int3
Size (Disk)	7.1 GB, 3.2 GB, 3.2 GB, 2.9 GB, 16.0 GB, 6.5 GB, 6.5 GB, 5.8 GB
ChartQA (test)	83.48, 82.52, 82.56, 76.68, 83.2, 79.68, 81.48, 78.56
OCRBench	791, 786, 784, 742, 846, 837, 845, 823

Note

Evaluations are performed using [lmms - eval](https://github.com/EvolvingLMMs - Lab/lmms - eval) with default setting.
GPTQ models are computationally more effective (fewer VRAM usage, faster inference speed) than AWQ series in these evaluations.
We recommend use gptqmodel instead of autogptq library, as autogptq is no longer maintained.

📄 License

This project is licensed under the Apache 2.0 license.

⚠️ Important Note

This is NOT an official model by Qwen. Use at your own risk.

💡 Usage Tip

For detailed usage, please check [Qwen2.5 - VL's page](https://huggingface.co/Qwen/Qwen2.5 - VL - 3B - Instruct).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご