U

UI TARS 2B SFT

Developed by ByteDance-Seed
UI-TARS is a next-generation native Graphical User Interface (GUI) agent model designed to seamlessly interact with GUIs through human-like perception, reasoning, and action capabilities.
Downloads 5,553
Release Time : 1/20/2025

Model Overview

UI-TARS is a Vision-Language Model (VLM) that integrates all key components—perception, reasoning, localization, and memory—into a single model, enabling end-to-end task automation without predefined workflows or manual rules.

Model Features

End-to-End GUI Interaction
Integrates perception, reasoning, localization, and memory capabilities to achieve seamless graphical user interface interaction
Multimodal Capabilities
Combines visual and language understanding to process both image and text inputs
High-Performance Localization
Excels in localization tasks such as ScreenSpot Pro evaluations

Model Capabilities

Graphical User Interface Interaction
Visual Understanding
Text Understanding
Interface Element Localization
Multimodal Reasoning

Use Cases

Automated Testing
GUI Automated Testing
Automatically identifies and operates interface elements for software testing
Improves testing efficiency and coverage
Assistive Tools
Accessibility Assistance
Helps visually impaired users understand and operate graphical interfaces
Enhances accessibility experience
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase