G

GUI Actor 2B Qwen2 VL

Developed by microsoft
GUI-Actor-2B is a vision-language model based on Qwen2-VL-2B, specifically designed for graphical user interface (GUI) positioning tasks. By adding an attention-based action head and fine-tuning, it performs well in multiple GUI positioning benchmark tests.
Downloads 163
Release Time : 6/1/2025

Model Overview

This model is mainly used to perform GUI positioning tasks and can predict the operation position based on screenshots and instructions.

Model Features

Based on the Qwen2-VL backbone model
Built on the powerful Qwen2-VL-2B vision-language model, with excellent visual understanding ability
Dedicated action head design
Add an attention-based action head to specifically optimize GUI positioning tasks
Excellent performance in multiple benchmark tests
Achieved leading results on multiple GUI positioning benchmarks such as ScreenSpot-Pro, ScreenSpot, and ScreenSpot-v2

Model Capabilities

GUI element positioning
Vision-Language understanding
Screen instruction understanding
Operation point prediction

Use Cases

Automated testing
GUI element positioning
Automatically locate specific elements on the screen according to instructions
Achieved an accuracy of 36.7% on ScreenSpot-Pro
Assistive tools
Accessibility operation assistance
Help visually impaired users operate the graphical interface through voice instructions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase