Qwen2.5-VL-3B-Instruct-GPTQ-Int3 Open Source Model - Efficient Multimodal Image-Text Processing, Saves Video Memory and Runs Fast

Qwen2.5 VL 3B Instruct GPTQ Int3

Developed by hfl

The GPTQ-Int3 quantized version of Qwen2.5-VL-3B-Instruct, suitable for multimodal image-text processing tasks with reduced VRAM usage and faster inference speed.

Image-to-Text

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multimodal Image-Text Understanding #GPTQ Low-Bit Quantization #Efficient Visual Question Answering

Downloads 60

Release Time : 3/20/2025

Model Overview

This is a GPTQ-Int3 quantized version based on the Qwen2.5-VL-3B-Instruct model, focusing on multimodal interaction tasks between images and text, such as visual question answering and OCR recognition.

Model Features

Efficient Quantization

Utilizes GPTQ-Int3 quantization technology to significantly reduce model disk space and VRAM requirements

Multimodal Support

Processes both image and text inputs simultaneously for visual-language interaction

Performance Retention

Maintains high task performance after quantization, such as in ChartQA and OCRBench

Computational Efficiency

Compared to AWQ quantized versions, requires less VRAM and offers faster inference speed

Model Capabilities

Image caption generation

Visual question answering

OCR text recognition

Multimodal interaction

Use Cases

Education

Chart Comprehension

Helps students understand data in complex charts

Achieves 76.68 points on the ChartQA test set

Document Processing

OCR Enhancement

Recognizes and understands text-image content in scanned documents

Scores 742 on OCRBench

Content Moderation

Multimodal Content Analysis

Simultaneously analyzes image and text content for moderation

🚀 Qwen2.5-VL-3B-Instruct-GPTQ-Int3

This project offers an UNOFFICIAL GPTQ-Int3 quantized version of the Qwen2.5-VL model, leveraging the gptqmodel library. It's fully compatible with the latest transformers library, which also supports non - quantized Qwen2.5-VL models.

✨ Features

Quantization: Utilizes GPTQ-Int3 quantization to reduce model size and enhance computational efficiency.
Compatibility: Works seamlessly with the transformers library, facilitating easy integration.

📦 Installation

Install the required libraries:

pip install git+https://github.com/huggingface/transformers accelerate qwen-vl-utils
pip install git+https://github.com/huggingface/optimum.git
pip install gptqmodel

Optionally, you may need to install:

pip install tokenicer device_smi logbar

💻 Usage Examples

Basic Usage

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4", 
    attn_implementation="flash_attention_2",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("hfl/Qwen2.5-VL-3B-Instruct-GPTQ-Int4")

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": "https://raw.githubusercontent.com/ymcui/Chinese-LLaMA-Alpaca-3/refs/heads/main/pics/banner.png"},
        {"type": "text", "text": "请你描述一下这张图片。"},
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text], images=image_inputs, videos=video_inputs,
    padding=True, return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=512)
generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text[0])

Output Example

Response:

This image shows a logo in both Chinese and English, reading "中文LLaMA & Alpaca大模型" (Chinese LLaMA & Alpaca Large Language Models). On the left side of the logo, there are two cartoon characters: one is an alpaca with a red scarf, and the other is an alpaca with white fur. The background features a green grassland and a building with a red roof. On the right side of the logo, there is a number 3, accompanied by some circuit patterns. The overall design is simple and clear, using bright colors and cute cartoon characters to attract attention.

📚 Documentation

Performance

Model	Size (Disk)	ChartQA (test)	OCRBench
Qwen2.5-VL-3B-Instruct	7.1 GB	83.48	791
Qwen2.5-VL-3B-Instruct-AWQ	3.2 GB	82.52	786
Qwen2.5-VL-3B-Instruct-GPTQ-Int4	3.2 GB	82.56	784
Qwen2.5-VL-3B-Instruct-GPTQ-Int3	2.9 GB	76.68	742
Qwen2.5-VL-7B-Instruct	16.0 GB	83.2	846
Qwen2.5-VL-7B-Instruct-AWQ	6.5 GB	79.68	837
Qwen2.5-VL-7B-Instruct-GPTQ-Int4	6.5 GB	81.48	845
Qwen2.5-VL-7B-Instruct-GPTQ-Int3	5.8 GB	78.56	823

Note

⚠️ Important Note

Evaluations are performed using lmms-eval with default setting.

GPTQ models are computationally more effective (fewer VRAM usage, faster inference speed) than AWQ series in these evaluations.

We recommend use gptqmodel instead of autogptq library, as autogptq is no longer maintained.

💡 Usage Tip

For detailed usage, please check Qwen2.5-VL's page.

Disclaimer

⚠️ Important Note

This is NOT an official model by Qwen. Use at your own risk.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご