A

Agentcpm GUI

Developed by openbmb
AgentCPM-GUI is an on-device graphical interface agent with RFT-enhanced reasoning capabilities, capable of operating Chinese and English applications, built upon the 8-billion-parameter MiniCPM-V.
Downloads 541
Release Time : 5/8/2025

Model Overview

An open-source large language agent model for on-device use, jointly developed by Tsinghua NLP Lab, Renmin University of China, and ModelBest. It takes smartphone screenshots as input and autonomously executes user-specified tasks.

Model Features

High-Quality GUI Localization
Pre-trained on large-scale bilingual Android datasets, significantly improving the ability to locate and understand common GUI components.
Chinese App Operation
The first open-source GUI agent fine-tuned specifically for Chinese applications, covering 30+ popular Chinese apps.
Enhanced Planning and Reasoning
Reinforcement Fine-Tuning (RFT) enables the model to think before outputting actions, greatly improving success rates in complex tasks.
Compact Action Space Design
Optimized action space and concise JSON format reduce the average action length to 9.7 tokens, enhancing on-device inference efficiency.

Model Capabilities

Graphical Interface Understanding
Screen Element Localization
Multimodal Interaction
Task Planning
Automated Operation

Use Cases

Mobile App Automation
Chinese App Navigation
Execute navigation, search, and other tasks in Chinese apps like Gaode Map and Dianping
Achieved an average score of 71.3 in localization benchmark tests
Cross-Language Interface Operation
Accurately identify and operate target elements in mixed Chinese-English interfaces
Scored 76.5 in text-to-coordinate tasks
Accessibility Assistance
Visual Assistance Operation
Help visually impaired users operate mobile device interfaces via voice commands
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase