U

Uground V1 72B

Developed by osunlp
UGround is a powerful GUI visual localization model trained with a simple recipe, focusing on image-text-to-text multimodal tasks.
Downloads 129
Release Time : 1/11/2025

Model Overview

UGround is a visual localization model jointly developed by OSUNLP and Orby AI, based on the Qwen2-VL architecture, capable of handling multimodal interaction tasks between images and text.

Model Features

Powerful GUI Visual Localization Capability
UGround can accurately understand and locate elements in graphical user interfaces, enabling efficient image-text interaction.
Multimodal Support
The model supports multimodal interaction between images and text, capable of handling complex visual and language tasks.
Based on Qwen2-VL Architecture
Utilizes the advanced Qwen2-VL-72B architecture, offering powerful computational capabilities and processing efficiency.

Model Capabilities

Image-text interaction
GUI element localization
Multimodal task processing

Use Cases

GUI Automation
Screen Element Localization
Used in automated testing to locate and manipulate GUI elements on the screen.
Improves the accuracy and efficiency of automated testing.
Multimodal Interaction
Image Caption Generation
Generates detailed textual descriptions based on image content.
Enhances the quality of image understanding and description.
Featured Recommended AI Models
ยฉ 2025AIbase