G

Granite Vision 3.2 2b

Developed by unsloth
granite-vision-3.2-2b is a compact and efficient vision-language model specifically designed for visual document understanding, capable of automatically extracting content from tables, charts, infographics, and more.
Downloads 43
Release Time : 3/14/2025

Model Overview

This model is trained on a carefully curated instruction-following dataset, including diverse public datasets and synthetic datasets tailored for broad document understanding and general image tasks. It is fine-tuned from the Granite large language model with image and text modalities.

Model Features

Efficient Visual Document Understanding
Capable of automatically extracting content from tables, charts, infographics, drawings, and diagrams.
Multimodal Capabilities
Processes both visual and textual data, suitable for a wide range of business scenarios.
High Performance
Outperforms comparable models on multiple document understanding benchmarks.
Lightweight Design
Only 2B parameters, delivering powerful performance while maintaining efficiency.

Model Capabilities

Table Analysis
Chart Understanding
Infographic Parsing
Optical Character Recognition (OCR)
Document Content Question Answering
General Image Understanding
Visual Question Answering

Use Cases

Document Processing
Document Question Answering
Answer questions based on document content
Achieved 0.89 accuracy on the DocVQA benchmark
Chart Analysis
Extract and analyze data from charts
Achieved 0.87 accuracy on the ChartQA benchmark
General Visual Understanding
Visual Question Answering
Answer questions about image content
Achieved 0.78 accuracy on the VQAv2 benchmark
Real-World Scene Understanding
Understand content in real-world images
Achieved 0.63 accuracy on the RealWorldQA benchmark
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase