P

Paligemma2 3b Mix 224 Jax

Developed by google
PaliGemma 2 is an upgraded vision-language model based on Gemma 2, supporting multilingual image-text input and text output, specifically designed for vision-language tasks
Downloads 38
Release Time : 2/3/2025

Model Overview

Integrates open components of SigLIP vision model and Gemma 2 language model, excelling in tasks like image captioning, visual QA, text reading, object detection and segmentation

Model Features

Unified Multitask Architecture
Single model supports diverse vision-language tasks including caption generation, QA, OCR, object detection and segmentation
Multilingual Support
Extended support for 34 languages' vision-language understanding through datasets like CC3M-35L
Responsible AI Design
Training data undergoes multiple filters (pornography/toxicity/privacy) complying with Google's content safety policies

Model Capabilities

Image Caption Generation
Visual Question Answering
Optical Character Recognition
Object Detection
Image Segmentation
Multilingual Understanding

Use Cases

Assistive Technology
Blind Vision Assistance
Provides image content descriptions for visually impaired users
Achieves 64.2 accuracy on AOKVQA validation set
Document Processing
Scene Text Recognition
Extracts text content from natural scene images
75.9 F1-score on ICDAR 2015 benchmark
Content Moderation
Image Safety Analysis
Detects sensitive content in images
Meets safety thresholds through manual evaluation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase