P

Paligemma 3b Ft Vqav2 224

Developed by google
PaliGemma is a multi-functional lightweight vision-language model that combines image and text inputs to generate text outputs and supports multiple languages.
Downloads 150
Release Time : 5/12/2024

Model Overview

PaliGemma is designed for fine-tuning performance in vision-language tasks and can be used in various scenarios such as image and short video captioning, visual question answering, text reading, object detection, and object segmentation.

Model Features

Versatility
Combines image and text inputs to generate text outputs and supports multiple languages.
Lightweight
Built on open components, easy to use and deploy.
High Performance
Performs well in various vision-language tasks, such as question answering, caption generation, and segmentation.

Model Capabilities

Image Caption Generation
Visual Question Answering
Object Detection
Object Segmentation
Multilingual Support

Use Cases

Image Processing
Image Caption Generation
Generate descriptive captions for images, supporting multiple languages.
The CIDEr score on the COCO captions validation set is 141.92 (224 resolution)
Object Detection
Detect objects in the image and generate bounding box coordinates.
Question Answering System
Visual Question Answering
Answer questions about the content of the image.
The accuracy on the VQAv2 test set is 83.19% (224 resolution)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase