C

Cogagent Chat Hf

Developed by THUDM
CogAgent is an open-source vision-language model based on CogVLM improvements, featuring GUI agent capabilities, multi-round visual dialogue, and visual grounding.
Downloads 503
Release Time : 12/15/2023

Model Overview

CogAgent is a high-performance vision-language model specializing in GUI agent tasks and visual dialogue, supporting 1120x1120 high-resolution image input.

Model Features

High-Resolution Visual Processing
Supports 1120x1120 ultra-high-resolution image input, providing finer visual understanding capabilities
GUI Agent Functionality
Capable of understanding and operating various GUI interfaces, including web, PC, and mobile applications
Enhanced Visual Grounding
Precisely locates and describes object positions in images
Multi-round Visual Dialogue
Supports in-depth multi-round dialogues based on images

Model Capabilities

Visual Question Answering
GUI Operation Planning
Image Content Description
Visual Grounding
Multi-round Dialogue
OCR Enhancement

Use Cases

GUI Automation
Web Automation Operation
Generates operation steps based on webpage screenshots
Performs excellently on AITW and Mind2Web datasets
Visual Question Answering
Complex Image Understanding
Answers questions about complex images
Achieves top-tier performance across 9 cross-modal benchmarks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase