P

Paligemma 3b Ft Science Qa 448

Developed by google
PaliGemma is a 3B-parameter lightweight vision-language model developed by Google, built upon SigLIP vision model and Gemma language model, supporting image and text inputs to generate text outputs.
Downloads 15
Release Time : 5/13/2024

Model Overview

A versatile vision-language model designed for image captioning, visual question answering, text reading, object detection & segmentation, with multilingual processing capabilities.

Model Features

Lightweight Design
Only 3B parameters, suitable for deployment in resource-constrained scenarios
Multi-task Adaptation
Configurable for different vision-language tasks via task prefixes (e.g., 'detect' or 'segment')
Multi-resolution Support
Offers 224/448/896px input resolution versions to meet varying precision requirements
Responsible Training
Training data undergoes rigorous safety filtering, removing pornographic, toxic, and personal information content

Model Capabilities

Image caption generation
Visual question answering
Text reading
Object detection
Image segmentation
Multilingual processing

Use Cases

Education
Science QA System
Scientific question answering based on ScienceQA dataset
Demonstrates excellent performance when fine-tuned on ScienceQA benchmark
Assistive Technology
Visual Impairment Assistance
Describing image content for visually impaired users
Content Moderation
Image Safety Analysis
Detecting sensitive or inappropriate content in images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase