Sapiens - Pretrain - 0.3b Open - Source Vision Model - Suitable for Human

Sapiens Pretrain 0.3b

Developed by facebook

Sapiens is a vision Transformer model pretrained on 300 million high-resolution human images, specifically designed for human-centric vision tasks.

Image Classification English#High-resolution vision #Human image feature extraction #300M parameter ViT

Downloads 34

Release Time : 9/10/2024

Model Overview

Sapiens-0.3B is a high-resolution vision Transformer model pretrained on 300 million 1024x1024 resolution human images, excelling in human-centric vision tasks and demonstrating outstanding generalization capabilities in real-world scenarios.

Model Features

High-resolution processing capability

Natively supports 1024x1024 high-resolution image processing, capable of directly handling HD images without downsampling.

Human-centric pretraining

Pretrained on 300 million human images, making it particularly suitable for human-centric vision tasks.

Exceptional generalization performance

Demonstrates excellent generalization on real data even with scarce labeled data or completely synthetic scenarios.

Efficient architecture design

Utilizes 16x16 patch strategy and 1024-dimensional embeddings to optimize computational efficiency while maintaining performance.

Model Capabilities

High-resolution image feature extraction

Human image analysis

Visual representation learning

Transfer learning foundation model

Use Cases

Computer vision

Human pose estimation

Utilizes pretrained features for human keypoint detection and pose analysis.

Achieves good performance even with limited labeled data

Person re-identification

Used for cross-camera person feature extraction and matching tasks.

High-resolution processing capability improves recognition accuracy

Virtual reality

Digital human modeling

Serves as a foundation model for generating realistic digital human avatars.

Excellent migration capability from synthetic to real-world scenarios

Property	Details
Image Size	1024 x 1024
Num Parameters	0.336 B
FLOPs	1.242 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1024
Num Layers	24
Num Heads	16
Feedforward Channels	4096

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Sapiens Pretrain 0.3b

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Pretrain-Sapiens-0.3B

✨ Features

📚 Documentation

Model Details

Model Card

More Resources

💻 Usage Examples

Basic Usage

📄 License