U

UI TARS 7B DPO

Developed by ByteDance-Seed
UI-TARS is a next-generation native graphical user interface (GUI) agent model designed to seamlessly interact with GUIs through human-like perception, reasoning, and action capabilities.
Downloads 38.74k
Release Time : 1/22/2025

Model Overview

UI-TARS integrates all key components—perception, reasoning, localization, and memory—into a single vision-language model (VLM), enabling end-to-end task automation without predefined workflows or manual rules.

Model Features

End-to-End Task Automation
Integrates perception, reasoning, localization, and memory functions without predefined workflows or manual rules.
High-Performance GUI Interaction
Excels in multiple benchmarks, particularly outperforming other models in localization capability evaluations.
Multimodal Support
Supports both visual and textual interactions with graphical user interfaces.

Model Capabilities

Graphical User Interface Interaction
Visual Perception
Textual Reasoning
Localization Capability
Task Automation

Use Cases

GUI Automation
Automated Testing
Used for automated testing of GUI functionality and performance.
Performs exceptionally well in the ScreenSpot Pro benchmark.
User Interface Navigation
Assists users in navigating complex graphical user interfaces.
Excels in the VisualWebBench and WebSRC benchmarks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase