U

Uground V1 72B Preview

Developed by osunlp
Qwen2-VL is the latest iteration of the Qwen-VL model series, featuring full-resolution image understanding, ultra-long video parsing, and multilingual text and image recognition capabilities.
Downloads 21
Release Time : 1/7/2025

Model Overview

A 72-billion-parameter multimodal vision-language model supporting image understanding, video analysis, multilingual text recognition, and agent operations.

Model Features

Full-resolution image understanding
Achieves human-like visual processing through dynamic visual token mapping, reaching state-of-the-art performance on benchmarks like MathVista and DocVQA
Ultra-long video understanding
Capable of parsing video content exceeding 20 minutes, supporting high-quality video Q&A, dialogue, and creation
Agent operating system
Combines complex reasoning and decision-making capabilities, enabling automated operations driven by visual environments for devices like smartphones and robots
Multilingual text and image understanding
Supports multilingual text recognition in images, covering major European languages, Japanese, Korean, Arabic, Vietnamese, and more

Model Capabilities

Image understanding
Video analysis
Multilingual text recognition
Agent operations
Complex reasoning
Decision support

Use Cases

Document processing
Document Q&A
Parse document images and answer related questions
Achieves 96.5% accuracy on the DocVQA test set
Education
Math problem solving
Parse mathematical charts and solve problems
Achieves 70.5% accuracy on the MathVista test set
Smart devices
Android device operation
Control Android devices through visual understanding
Achieves 89.6% type matching accuracy on the AITZ benchmark
Featured Recommended AI Models
ยฉ 2025AIbase