Q

Qwen2.5 VL 3B Instruct GPTQ Int4

Developed by hfl
This is the GPTQ-Int4 quantized version of the Qwen2.5-VL-3B-Instruct model, suitable for multimodal tasks involving image-to-text and text-to-text, supporting both Chinese and English.
Downloads 1,312
Release Time : 2/24/2025

Model Overview

This model is a GPTQ-Int4 quantized version based on Qwen2.5-VL-3B-Instruct, primarily designed for multimodal tasks involving images and text, capable of generating text descriptions related to images or answering relevant questions.

Model Features

Efficient Quantization
Utilizing GPTQ-Int4 quantization technology, it significantly reduces disk space and VRAM requirements while maintaining high performance.
Multimodal Support
Capable of processing both image and text inputs to generate relevant text outputs.
High Performance
Demonstrates excellent performance on benchmarks like ChartQA and OCRBench, approaching the performance of the original model.

Model Capabilities

Image Caption Generation
Image Question Answering
Multimodal Text Generation

Use Cases

Image Understanding
Image Description
Generate detailed descriptions of input images.
Example output: This image shows a bilingual sign in Chinese and English, displaying 'Chinese LLaMA & Alpaca Large Model' and 'Chinese LLaMA & Alpaca Large Language Models'.
Image QA
Answer questions related to image content.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase