P

Paligemma2 3b Pt 448

Developed by google
PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text input to generate text output, suitable for various vision-language tasks.
Downloads 3,412
Release Time : 11/21/2024

Model Overview

PaliGemma 2 is an update to the PaliGemma vision-language model, incorporating the capabilities of the Gemma 2 model. It supports tasks such as image and short video captioning, visual question answering, text reading, object detection, and segmentation.

Model Features

Multi-Task Support
Supports various vision-language tasks, including image captioning, visual question answering, text reading, object detection, and segmentation.
Multilingual Capability
Supports text input and output in multiple languages, suitable for international application scenarios.
High-Resolution Processing
Supports 448*448 input images, improving the accuracy and detail processing capabilities of visual tasks.
Responsible AI
Training data is rigorously filtered to ensure safe and responsible data usage.

Model Capabilities

Image Captioning
Visual Question Answering
Text Reading
Object Detection
Object Segmentation
Multilingual Text Generation

Use Cases

Image Understanding
Image Caption Generation
Generates detailed textual descriptions based on input images.
Achieved a score of 142.4 for English descriptions on the COCO-35L dataset.
Visual Question Answering
Answers natural language questions about image content.
Achieved an accuracy of 71.2% on the AOKVQA-DA validation set.
Document Processing
Document Visual Question Answering
Extracts information from document images and answers questions.
Achieved an accuracy of 76.1% on the DocVQA validation set.
Chart Understanding
Parses chart images and answers related questions.
Achieved an accuracy of 66.4% on the ChartQA human dataset.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase