Q

Qwen2.5 VL 3B Instruct GPTQ Int3

Developed by hfl
The GPTQ-Int3 quantized version of Qwen2.5-VL-3B-Instruct, suitable for multimodal image-text processing tasks with reduced VRAM usage and faster inference speed.
Downloads 60
Release Time : 3/20/2025

Model Overview

This is a GPTQ-Int3 quantized version based on the Qwen2.5-VL-3B-Instruct model, focusing on multimodal interaction tasks between images and text, such as visual question answering and OCR recognition.

Model Features

Efficient Quantization
Utilizes GPTQ-Int3 quantization technology to significantly reduce model disk space and VRAM requirements
Multimodal Support
Processes both image and text inputs simultaneously for visual-language interaction
Performance Retention
Maintains high task performance after quantization, such as in ChartQA and OCRBench
Computational Efficiency
Compared to AWQ quantized versions, requires less VRAM and offers faster inference speed

Model Capabilities

Image caption generation
Visual question answering
OCR text recognition
Multimodal interaction

Use Cases

Education
Chart Comprehension
Helps students understand data in complex charts
Achieves 76.68 points on the ChartQA test set
Document Processing
OCR Enhancement
Recognizes and understands text-image content in scanned documents
Scores 742 on OCRBench
Content Moderation
Multimodal Content Analysis
Simultaneously analyzes image and text content for moderation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase