Sapiens-pose-1b Open Source Human Pose Estimation Model - A Practical Tool Supporting 308 Keypoint Detection

Sapiens Pose 1b

Developed by facebook

Pose-Sapiens-1B is a high-resolution human pose estimation model based on the Vision Transformer architecture, pre-trained on 300 million 1024x1024 resolution human images, supporting 308 keypoint detections (body, face, hands, and feet).

Pose Estimation English#High-resolution pose estimation #Full-body keypoint detection #Billion-parameter ViT

Downloads 82

Release Time : 9/10/2024

Model Overview

This model is designed for high-precision human pose estimation, demonstrating exceptional generalization capabilities in real-world scenarios, especially in situations with scarce annotated data or fully synthetic environments.

Model Features

High-resolution support

Native support for 1K high-resolution inference (1024x768), suitable for processing high-precision images.

Multi-part keypoint detection

Simultaneously detects 308 keypoints for the body, face, hands, and feet.

Strong generalization capability

Performs well on real-world data even in scenarios with scarce annotated data or fully synthetic environments.

Large-scale pre-training

Pre-trained on 300 million human images, learning rich pose feature representations.

Model Capabilities

Human pose estimation

Facial keypoint detection

Hand keypoint detection

Foot keypoint detection

High-resolution image processing

Use Cases

Motion analysis and sports science

Athlete pose analysis

Used to analyze athletes' movement poses to optimize training effectiveness.

Provides precise location data for 308 keypoints

Virtual and augmented reality

Virtual avatar control

Used for precise motion capture to drive virtual avatars.

Achieves high-fidelity human motion reproduction

Medical rehabilitation

Rehabilitation training monitoring

Monitors whether patients' rehabilitation training movements are correct.

Provides accurate pose evaluation data

🚀 Pose-Sapiens-1B

Sapiens is a family of vision transformers pretrained on a large - scale human image dataset, which can generalize well to in - the - wild conditions when finetuned for human - centric vision tasks.

🚀 Quick Start

The Pose - Sapiens - 1B model is designed for keypoint detection tasks, especially for estimating 308 keypoints (body + face + hands + feet) on a single image.

✨ Features

Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. When finetuned for human - centric vision tasks, it can generalize to in - the - wild conditions.
Sapiens - 1B natively supports 1K high - resolution inference. The resulting models show remarkable generalization to in - the - wild data, even with scarce labeled data or entirely synthetic data.

📚 Documentation

Model Details

Developed by: Meta
Model type: Vision Transformer
License: Creative Commons Attribution - NonCommercial 4.0
Task: pose
Format: original
File: sapiens_1b_goliath_best_goliath_AP_639.pth

Property	Details
Model Type	Vision Transformer
Developed by	Meta
License	Creative Commons Attribution - NonCommercial 4.0
Task	pose
Format	original
File	sapiens_1b_goliath_best_goliath_AP_639.pth

Model Card

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	1.169 B
FLOPs	4.647 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1536
Num Layers	40
Num Heads	24
Feedforward Channels	6144

More Resources

Repository: https://github.com/facebookresearch/sapiens
Paper: https://arxiv.org/abs/2408.12569
Demo: [https://huggingface.co/spaces/facebook/sapiens - pose](https://huggingface.co/spaces/facebook/sapiens - pose)
Project Page: https://about.meta.com/realitylabs/codecavatars/sapiens
Additional Results: https://rawalkhirodkar.github.io/sapiens
HuggingFace Collection: [https://huggingface.co/collections/facebook/sapiens - 66d22047daa6402d565cb2fc](https://huggingface.co/collections/facebook/sapiens - 66d22047daa6402d565cb2fc)

💻 Usage Examples

The Pose 1B model can be used for estimating 308 keypoints (body + face + hands + feet) on a single image.

📄 License

This model is released under the Creative Commons Attribution - NonCommercial 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご