Sapiens Pretrain 2b Bfloat16
Sapiens is a family of Vision Transformer models pre-trained on 300 million 1024x1024 resolution human images, supporting high-resolution inference and real-world scenario generalization.
Downloads 20
Release Time : 9/10/2024
Model Overview
Sapiens-2B is a pre-trained model based on the Vision Transformer architecture, specifically designed for human-centric vision tasks. It demonstrates exceptional generalization capabilities on real data even with scarce annotations or fully synthetic conditions.
Model Features
High-resolution support
Natively supports 1024x1024 high-resolution image processing, ideal for handling high-quality visual data.
Large-scale pre-training
Pre-trained on 300 million human images, featuring powerful feature extraction capabilities.
Real-world generalization
Demonstrates exceptional generalization on real data even with scarce annotations or fully synthetic conditions.
Efficient computation
Utilizes bfloat16 format to balance computational efficiency and model accuracy.
Model Capabilities
High-resolution image processing
Human image feature extraction
Vision task fine-tuning
Real-world scenario generalization
Use Cases
Computer vision
Human pose estimation
Utilizes pre-trained features for human pose recognition and analysis.
Face recognition
High-resolution image-based facial feature extraction and recognition.
Augmented reality
Virtual avatar generation
Used to generate realistic virtual human avatars.
Featured Recommended AI Models
Š 2025AIbase