Sapiens Pretrain 2b
Sapiens-2B is a Vision Transformer model pre-trained on 300 million high-resolution human images, specifically designed for human-centric vision tasks with exceptional generalization capabilities.
Downloads 28
Release Time : 9/10/2024
Model Overview
Sapiens-2B is a 2.163 billion parameter Vision Transformer model pre-trained on 1024Ã1024 resolution human images. Optimized for human-centric vision tasks, it demonstrates outstanding generalization to real-world data even with scarce labeled data or fully synthetic scenarios.
Model Features
High-resolution support
Native support for processing high-resolution images at 1024Ã1024 pixels
Large-scale pre-training
Pre-trained on 300 million human images with powerful feature extraction capabilities
Exceptional generalization
Demonstrates outstanding generalization to real-world data even with scarce labeled data or fully synthetic scenarios
Efficient architecture
Adopts Vision Transformer architecture with 48 network layers and 32 attention heads
Model Capabilities
Human image feature extraction
High-resolution image processing
Visual representation learning
Transfer learning
Use Cases
Computer vision
Human pose estimation
Used for extracting human pose features from high-resolution images
Face recognition
Can serve as a base feature extractor for face recognition systems
Augmented reality
Virtual avatar generation
Used for generating high-fidelity human virtual avatars
Featured Recommended AI Models
Š 2025AIbase