Sapiens-pose-0.6b Open-source Vision Model - Precision Handling of Human-centered Vision Tasks

Sapiens Pose 0.6b

Developed by facebook

Sapiens is a family of vision Transformer models pre-trained on 300 million high-resolution human images, focusing on human-centric vision tasks.

Pose Estimation English#High-resolution pose estimation #Full-body keypoint detection #Synthetic data generalization

Downloads 19

Release Time : 9/18/2024

Model Overview

Pose-Sapiens-0.6B is a vision Transformer model for pose estimation, supporting the estimation of 308 keypoints (body + face + hands + feet) on a single image.

Model Features

High-resolution support

Native support for 1K high-resolution inference, with image sizes up to 1024 x 768.

Outstanding generalization capability

Demonstrates excellent generalization to real-world data even with scarce labeled data or fully synthetic scenarios.

Multi-keypoint detection

Supports estimation of 308 keypoints across body, face, hands, and feet.

Model Capabilities

Human pose estimation

Facial keypoint detection

Hand keypoint detection

Foot keypoint detection

Use Cases

Computer vision

Human pose analysis

Used for human pose estimation in scenarios such as sports analysis and fitness coaching.

Virtual reality

Provides precise human pose data for virtual reality applications.

🚀 Pose-Sapiens-0.6B

Pose-Sapiens-0.6B is a vision transformer model for human pose estimation, pretrained on a large - scale human image dataset and capable of high - resolution inference.

🚀 Quick Start

The Pose-Sapiens-0.6B model is designed to estimate 308 keypoints (including body, face, hands, and feet) on a single image. It belongs to the Sapiens family of vision transformers, which are pretrained on 300 million human images at a resolution of 1024 x 1024.

✨ Features

The Sapiens family of models generalizes well to in - the - wild conditions when finetuned for human - centric vision tasks.
Sapiens-0.6B natively supports 1K high - resolution inference and shows remarkable generalization even with scarce or entirely synthetic labeled data.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Developed by: Meta
Model type: Vision Transformer
License: Creative Commons Attribution - NonCommercial 4.0
Task: pose
Format: original
File: sapiens_0.6b_goliath_best_goliath_AP_609.pth

Model Card

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	0.664 B
FLOPs	2.583 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1280
Num Layers	32
Num Heads	16
Feedforward Channels	5120

More Resources

Repository: https://github.com/facebookresearch/sapiens
Paper: https://arxiv.org/abs/2408.12569
Demo: https://huggingface.co/spaces/facebook/sapiens-pose
Project Page: https://about.meta.com/realitylabs/codecavatars/sapiens
Additional Results: https://rawalkhirodkar.github.io/sapiens
HuggingFace Collection: https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc

🔧 Technical Details

No specific technical implementation details are provided in the original document, so this section is skipped.

📄 License

The model is released under the Creative Commons Attribution - NonCommercial 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご