Q

Qwen2.5 VL 7B Instruct GPTQ Int3

Developed by hfl
This is an unofficial GPTQ-Int3 quantized version based on the Qwen2.5-VL-7B-Instruct model, suitable for multimodal image-text-to-text tasks.
Downloads 577
Release Time : 3/20/2025

Model Overview

This model is a multimodal model capable of processing both image and text inputs to generate text outputs. Primarily designed for image understanding and text generation tasks.

Model Features

Efficient Quantization
Utilizes GPTQ-Int3 quantization technology to significantly reduce model disk usage and VRAM requirements.
Multimodal Support
Capable of processing both image and text inputs for image understanding and text generation.
High Performance
Demonstrates excellent performance on benchmarks like ChartQA and OCRBench.
Strong Compatibility
Compatible with the latest transformers library and allows seamless switching with non-quantized Qwen2.5-VL models.

Model Capabilities

Image Understanding
Text Generation
Multimodal Reasoning
Image Captioning
Visual Question Answering

Use Cases

Image Understanding
Image Captioning
Generates detailed textual descriptions from input images
As shown in examples, accurately describes image content and details
Visual Question Answering
Chart Understanding
Answers questions about chart content
Achieved 78.56 score on ChartQA test
Document Processing
OCR Enhancement
Extracts and understands text content from images
Scored 823 on OCRBench test
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase