O

Omniparser V2.0

Developed by microsoft
OmniParser is a universal screen parsing tool capable of interpreting/converting UI screenshots into structured formats to enhance LLM-based UI agent performance.
Downloads 6,729
Release Time : 2/12/2025

Model Overview

OmniParser is designed to transform unstructured screenshot images into structured element lists, including interactive area locations and potential functional descriptions of icons. It is suitable for various types of screenshots (including PC and mobile) and multiple application scenarios.

Model Features

Efficient Parsing
Compared to V1, latency is reduced by 60%, achieving 0.6 seconds per frame on A100 and 0.8 seconds on a single RTX 4090.
Large-scale Dataset
The training dataset includes interactive icon detection and icon description datasets, which are larger and cleaner.
Strong Performance
Achieves an average accuracy of 39.6 on ScreenSpot Pro.
Multi-model Support
Out-of-the-box support for various large language models such as OpenAI, DeepSeek, Qwen, or Anthropic Computer Use.

Model Capabilities

UI Screenshot Parsing
Interactive Area Detection
Icon Function Description
Structured Data Conversion

Use Cases

UI Agent Development
LLM-based GUI Agent
Control a Windows 11 virtual machine using OmniParser + a chosen vision model.
Enhances the agent's understanding and operational capabilities for UIs.
Automated Testing
UI Element Detection
Automatically detect and describe interactive elements in applications.
Improves test coverage and efficiency.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Ā© 2025AIbase