U

Uground V1 7B

Developed by osunlp
UGround is a powerful GUI visual positioning model trained with a simple recipe, developed in collaboration by OSU NLP Group and Orby AI.
Downloads 2,053
Release Time : 1/3/2025

Model Overview

UGround is a GUI visual positioning model based on Qwen2-VL, specializing in accurately locating coordinates of specific areas/elements/objects on the screen.

Model Features

Multimodal visual positioning
Capable of accurately locating coordinates (x,y) of specific areas/elements/objects on the screen.
High performance
Excellent performance on the ScreenSpot benchmark, achieving an average score of 86.3.
Agent integration
Can be integrated with devices like phones/robots to enable automated operations in visual environments.

Model Capabilities

GUI visual positioning
Multimodal understanding
Agent operation

Use Cases

GUI visual positioning
ScreenSpot benchmark
Conducting GUI visual positioning tests under standard settings
Average score of 86.3, excelling in multiple subtasks
Agent setup
Used in combination with GPT-4o planner
Average score of 84.0, outstanding performance on mobile and desktop platforms
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase