Sapiens-depth-2b-torchscript Open-Source Human Vision Model - Empowering Human-Centric Vision Tasks with Strong Generalization Ability

Sapiens Depth 2b Torchscript

Developed by facebook

Sapiens is a vision Transformer model pre-trained on 300 million 1024×1024 resolution human images, specifically designed for human-centric vision tasks with exceptional generalization capabilities.

3D Vision English#Human Depth Estimation #High-Resolution Vision #Synthetic Data Generalization

Downloads 58

Release Time : 9/9/2024

Model Overview

This model is used for relative depth estimation of human images, natively supporting 1K high-resolution inference and maintaining good performance even with scarce annotated data or fully synthetic scenarios.

Model Features

High-Resolution Support

Native support for 1K high-resolution (1024×768) inference

Strong Generalization Capability

Demonstrates exceptional generalization to real data even with scarce annotations or fully synthetic scenarios

Large-Scale Pretraining

Pretrained on 300 million 1024×1024 resolution human images

Model Capabilities

Human Image Depth Estimation

High-Resolution Image Processing

Use Cases

Computer Vision

Human Depth Estimation

Estimating relative depth information from a single human image

Can generate precise depth maps

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	2.163 B
FLOPs	8.709 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1920
Num Layers	48
Num Heads	32
Feedforward Channels	7680

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Sapiens Depth 2b Torchscript

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Depth-Sapiens-2B-Torchscript

🚀 Quick Start

✨ Features

📚 Documentation

Model Details

Model Card

More Resources

📄 License