P

Paligemma2 10b Mix 448

Developed by google
PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.
Downloads 31.63k
Release Time : 11/21/2024

Model Overview

PaliGemma 2 is an update to the PaliGemma vision-language model, integrating the capabilities of the Gemma 2 model, supporting tasks such as image captioning, visual question answering, text reading, object detection, and segmentation.

Model Features

Multi-Task Support
Supports various vision-language tasks such as image captioning, visual question answering, text reading, object detection, and segmentation.
Multilingual Capability
Supports text generation and understanding in multiple languages.
High-Resolution Processing
Supports 448x448 input image resolution, improving the accuracy of visual tasks.
Responsible AI
Applies multiple data filtering methods to ensure the safety and responsibility of training data.

Model Capabilities

Image Captioning
Visual Question Answering
Optical Character Recognition
Object Detection
Image Segmentation
Multilingual Text Generation

Use Cases

Content Generation
Image Captioning
Generates short captions or detailed descriptions for images.
Achieves a CIDEr score of 142.4 on the COCO-35L dataset (English)
Video Captioning
Generates descriptive captions for short videos.
Visual Understanding
Visual Question Answering
Answers natural language questions about image content.
Achieves 70.8% accuracy on the AOKVQA-DA validation set
Text Reading
Recognizes and extracts text content from images.
Achieves 76.6% accuracy on the DocVQA validation set
Computer Vision
Object Detection
Detects objects in images and returns bounding box coordinates.
Image Segmentation
Locates regions occupied by objects in images and generates segmentation masks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase