Open-source Sapiens - 0.6b Model: Focused on Human-centered Vision Tasks with More Precise Recognition and Processing!

Sapiens Depth 0.6b

Developed by facebook

Sapiens is a family of Vision Transformer models pre-trained on 300 million 1024x1024 resolution human images, specializing in human-centric vision tasks.

3D Vision English#High-Resolution Depth Estimation #Human-Centric Vision #1K Image Processing

Downloads 19

Release Time : 9/10/2024

Model Overview

This model is used for relative depth estimation of human images, supporting 1K high-resolution inference and excelling in real-world scenarios.

Model Features

High-Resolution Support

Natively supports 1K high-resolution inference, suitable for human images at 1024x1024 resolution.

Strong Generalization Capability

Demonstrates excellent generalization to real-world data even with scarce labeled data or fully synthetic conditions.

Large-Scale Pre-training

Pre-trained on 300 million human images, equipped with powerful feature extraction capabilities.

Model Capabilities

Human Image Depth Estimation

High-Resolution Image Processing

Use Cases

Computer Vision

Human Depth Estimation

Used to estimate relative depth information of human images, applicable in virtual reality, augmented reality, and other scenarios.

Performs excellently in real-world conditions

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	0.664 B
FLOPs	2.583 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1280
Num Layers	32
Num Heads	16
Feedforward Channels	5120