G

GUI Actor 7B Qwen2 VL

Developed by microsoft
GUI-Actor-7B is a vision-language model developed based on Qwen2-VL-7B-Instruct, focusing on graphical user interface (GUI) agent tasks and providing a coordinate-free visual grounding solution.
Downloads 207
Release Time : 6/1/2025

Model Overview

By adding an attention-based action head and fine-tuning, this model can perform excellently in GUI grounding tasks and is suitable for automated GUI operation scenarios.

Model Features

Coordinate-free Visual Grounding
Adopt an innovative coordinate-free solution to directly predict GUI operation positions and simplify the interaction process
Attention-based Action Head
Enhance the model's positioning ability for GUI elements through a specially designed attention-based action head
Multiple Model Sizes to Choose From
Provide model versions with different parameter scales from 2B to 7B to meet different computing resource requirements
Validator Enhancement
Optionally equipped with a dedicated validator model to further improve operation accuracy

Model Capabilities

GUI Element Recognition
Screen Operation Positioning
Multimodal Understanding (Image + Text)
Automated Task Execution

Use Cases

Software Automated Testing
Automated UI Testing
Automatically identify and operate software interface elements for functional testing
Achieved an accuracy of 40.7% on the ScreenSpot-Pro benchmark test
RPA Process Automation
Business Process Automation
Automatically complete repetitive GUI operation tasks through visual understanding
Achieved an accuracy of 89.5% on the ScreenSpot-v2 benchmark test
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase