P

Paligemma 3b Ft Widgetcap Waveui 448

Developed by agentsea
A vision-language model fine-tuned for object detection tasks on the WaveUI dataset, based on PaliGemma 3B 448-resolution weights
Downloads 344
Release Time : 7/8/2024

Model Overview

A vision-language model focused on UI element detection, serving as a key component of the AgentSea open-source agent construction toolkit

Model Features

High-Precision UI Element Detection
Fine-tuned on the WaveUI dataset, specifically optimized for UI element detection performance
PaliGemma Architecture-Based
Built upon Google's PaliGemma 3B model with powerful multimodal understanding capabilities
Open-Source Agent Support
Serves as a core component of the AgentSea open-source agent construction toolkit

Model Capabilities

UI Element Detection
Multimodal Understanding
Object Localization

Use Cases

UI Automation
Interface Element Recognition
Automatically identifies interface elements like buttons and input fields in applications
Achieves 0.40 IoU on test set
Agent Development
Automated Testing
Used to build testing agents capable of understanding UIs
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase