Q

Qwen2.5 VL 3B UI R1 E

Developed by LZXzju
UI-R1-E-3B is an efficient GUI positioning model fine-tuned based on Qwen2.5-VL-3B-Instruct, specializing in visual question-answering tasks, particularly excelling at locating and identifying operational elements in user interface screenshots.
Downloads 75
Release Time : 5/14/2025

Model Overview

This model enhances the behavior prediction capability of GUI agents through reinforcement learning, accurately identifying operational elements in user interfaces and predicting the required actions (such as clicks) and their coordinate positions.

Model Features

Efficient GUI Positioning
Precisely locates operational elements in user interface screenshots and predicts click coordinates
Thoughtless Reasoning
Compared to versions with thought processes, it offers faster inference speed and higher accuracy
Multi-Platform Support
Delivers excellent performance on mobile (Mobile), desktop (Desktop), and web (Web) interfaces

Model Capabilities

GUI Element Recognition
Operation Instruction Understanding
Coordinate Positioning Prediction
Cross-Platform Interface Analysis

Use Cases

Automated Testing
UI Automated Testing
Automatically identifies interface elements and performs test operations
Achieves an average accuracy of 89.5% on the ScreenSpotV2 benchmark
Accessibility
Visual Impairment Assistance
Helps visually impaired users understand the positions of interface elements
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase