P

Paligemma 3b Ft Nlvr2 448

Developed by google
PaliGemma is a versatile and lightweight vision-language model (VLM) that supports image and text input and generates text output, suitable for various vision-language tasks.
Downloads 2,350
Release Time : 5/13/2024

Model Overview

PaliGemma is built on open components, such as the SigLIP vision model and the Gemma language model. It is designed for tasks like image and short video captioning, visual question answering, text reading, object detection, and segmentation, and supports multiple languages.

Model Features

Versatility
Capable of handling various vision-language tasks, including image and short video captioning, visual question answering, text reading, object detection, and object segmentation.
Lightweight
Built on open components with efficient performance.
Multilingual Support
Supports input and output in multiple languages.

Model Capabilities

Image Caption Generation
Visual Question Answering
Text Reading
Object Detection
Object Segmentation

Use Cases

Image Understanding
Image Caption Generation
Generate descriptive captions for images, supporting multiple languages.
Generate accurate and diverse image descriptions.
Visual Question Answering
Answer natural language questions about image content.
Achieve an accuracy of 65.47% on the GQA dataset.
Object Detection and Segmentation
Object Detection
Detect objects in images and generate bounding box coordinates.
Perform excellently on the OpenImages dataset.
Object Segmentation
Perform pixel-level segmentation of objects in images.
Generate precise segmentation codes.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase