Sapiens - depth - 1b - torchscript Open - source Vision Model: Empowering Human

Sapiens Depth 1b Torchscript

Developed by facebook

Sapiens is a vision transformer series model pre-trained on 300 million 1024 x 1024 resolution human images, focusing on human-centric vision tasks.

3D Vision English#High-resolution depth estimation #Specialized for human images #1-billion parameter large model

Downloads 160

Release Time : 9/9/2024

Model Overview

This model is used to estimate relative depth in human images, supports 1K high-resolution inference, and demonstrates outstanding generalization capabilities on real-world data.

Model Features

High-resolution support

Natively supports 1K high-resolution inference, suitable for high-quality image processing.

Outstanding generalization capability

Demonstrates excellent generalization performance on real-world data even with scarce or completely synthetic labeled data.

Large-scale pre-training

Pre-trained on 300 million human images, equipped with powerful feature extraction capabilities.

Model Capabilities

Human image depth estimation

High-resolution image processing

Visual feature extraction

Use Cases

Computer vision

Human depth perception

Used to estimate relative depth information of various body parts in human images

Can generate precise depth maps

Virtual reality applications

Provides depth information support for character modeling in VR/AR systems

Property	Details
Developed by	Meta
Model Type	Vision Transformer
License	Creative Commons Attribution - NonCommercial 4.0
Task	depth
Format	torchscript
File	sapiens_1b_render_people_epoch_88_torchscript.pt2

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	1.169 B
FLOPs	4.647 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1536
Num Layers	40
Num Heads	24
Feedforward Channels	6144