Sapiens Depth 2b Bfloat16
Sapiens-2B is a vision Transformer model pre-trained on 300 million high-resolution human images, specifically optimized for human depth estimation tasks, supporting 1K resolution inference with excellent generalization capabilities in real-world scenarios.
Downloads 17
Release Time : 9/10/2024
Model Overview
This model is a 2.1-billion-parameter vision Transformer developed by Meta for relative depth estimation tasks in human images, performing exceptionally well in both synthetic and real-world data scenarios.
Model Features
High-Resolution Support
Natively supports 1024Ã1024 resolution input, capable of processing human images up to 1024Ã768 in size.
Synthetic Data Generalization
Maintains excellent generalization capabilities for real-world data even when trained entirely on synthetic data.
Efficient Computation
Optimized with bfloat16 data format, achieving 8.709 trillion floating-point operations.
Model Capabilities
Human Depth Estimation
High-Resolution Image Processing
Transfer Learning from Synthetic to Real-World Scenarios
Use Cases
Virtual Reality
3D Human Modeling
Generates human depth information from a single image for 3D modeling.
Can produce accurate relative depth maps.
Film Special Effects
Depth-Aware Effects
Provides human depth information for post-production in films.
Supports more realistic depth-of-field effects and virtual scene integration.
Featured Recommended AI Models
Š 2025AIbase