Sapiens Pretrain 1b Torchscript
Sapiens is a family of vision Transformers pre-trained on 300 million 1024x1024 resolution human images, specifically designed for human-centric vision tasks.
Downloads 35
Release Time : 9/9/2024
Model Overview
Sapiens-1B is a high-resolution vision Transformer model, pre-trained on large-scale human images, suitable for feature extraction and fine-tuning tasks, particularly excelling in scenarios with scarce labeled data or completely synthetic conditions.
Model Features
High-resolution support
Native support for 1K high-resolution (1024x1024) image processing
Large-scale pre-training
Pre-trained on 300 million human images with powerful feature extraction capabilities
Real-world generalization
Demonstrates exceptional generalization to real data even with scarce labeled data or completely synthetic conditions
Efficient architecture
Utilizes a 40-layer vision Transformer architecture with 1536 embedding dimensions and 24 attention heads
Model Capabilities
High-resolution image processing
Human image feature extraction
Visual representation learning
Transfer learning
Use Cases
Computer vision
Human image analysis
Used for human-centric vision tasks such as pose estimation and action recognition
Demonstrates exceptional generalization in real-world scenarios
Feature extraction
Serves as a pre-trained model for extracting image features for downstream tasks
Featured Recommended AI Models
Š 2025AIbase