V

Vitpose Plus Large

Developed by usyd-community
ViTPose++ is a vision Transformer-based foundation model for human pose estimation, achieving an outstanding performance of 81.1 AP on the MS COCO keypoint test set.
Downloads 1,731
Release Time : 1/12/2025

Model Overview

ViTPose++ is a simple yet effective vision Transformer baseline model for human pose estimation tasks. It employs a non-hierarchical vision Transformer as the backbone network, combined with a lightweight decoder for pose estimation, featuring model simplicity, scalability, and flexibility.

Model Features

Simple yet Effective Architecture
Uses a plain vision Transformer as the backbone network, achieving high performance without complex domain-specific designs.
Highly Scalable
Model parameters can be scaled from 100M to 1B, establishing a new Pareto frontier between performance and throughput.
Training Flexibility
Supports multiple attention types, input resolutions, pretraining and fine-tuning strategies to handle various pose tasks.
Knowledge Transferability
Knowledge from large models can be easily transferred to smaller models through simple knowledge tokens.

Model Capabilities

Human Pose Estimation
Keypoint Detection
Multi-person Pose Recognition
Occluded Human Pose Estimation

Use Cases

Computer Vision
Human Pose Analysis
Detects human keypoint positions in images or videos.
Achieves 80.9 AP on the MS COCO test set.
Action Recognition
Identifies human actions by analyzing pose changes across consecutive frames.
Health & Fitness
Sports Pose Analysis
Tracks and analyzes posture and techniques in fitness exercises.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase