Sapiens-depth-1b-bfloat16 Open-Source Vision Model - Free for Human-Centric Vision Tasks

Sapiens Depth 1b Bfloat16

Developed by facebook

Sapiens is a vision Transformer model pre-trained on 300 million 1024x1024 resolution portrait images, focusing on human-centric vision tasks.

3D Vision English#Portrait Depth Estimation #High-Resolution Vision #Billion-Parameter Model

Downloads 37

Release Time : 9/10/2024

Model Overview

This model is used for relative depth estimation of portrait images, supports 1K high-resolution inference, and demonstrates exceptional generalization capabilities on real data even when labeled data is scarce or entirely synthetic.

Model Features

High-Resolution Support

Native support for 1K high-resolution inference, with image sizes up to 1024x768.

Large-Scale Pre-Training

Pre-trained on 300 million 1024x1024 resolution portrait images.

Exceptional Generalization

Demonstrates exceptional generalization capabilities on real data even when labeled data is scarce or entirely synthetic.

Model Capabilities

Portrait Image Depth Estimation

High-Resolution Image Processing

Use Cases

Computer Vision

Portrait Depth Estimation

Used to estimate the relative depth information of portrait images.

Demonstrates exceptional generalization capabilities on real data.

Property	Details
Developed by	Meta
Model Type	Vision Transformer
License	Creative Commons Attribution - NonCommercial 4.0
Task	depth
Format	bfloat16
File	sapiens_1b_render_people_epoch_88_bfloat16.pt2

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	1.169 B
FLOPs	4.647 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1536
Num Layers	40
Num Heads	24
Feedforward Channels	6144

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Sapiens Depth 1b Bfloat16

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Depth-Sapiens-1B-Bfloat16

🚀 Quick Start

✨ Features

📦 Model Information

Model Details

Model Card

📚 More Resources

📄 License