Sapiens-Pretrain-0.6B Open-Source Model - Pretrained on a large number of human body images, focusing on human-centered vision tasks

Sapiens Pretrain 0.6b

Developed by facebook

Sapiens is a Vision Transformer model pre-trained on 300 million 1024×1024 resolution human images, excelling in human-centric vision tasks.

Image Classification English#High-resolution human vision #Synthetic data generalization #1K image processing

Downloads 13

Release Time : 9/10/2024

Model Overview

A 600-million parameter Vision Transformer model with native support for 1K high-resolution inference, demonstrating exceptional generalization capabilities on real data even with scarce annotations or fully synthetic data.

Model Features

High-resolution support

Native support for 1024×1024 resolution image processing

Data efficiency

Maintains good generalization even with scarce annotations or fully synthetic data

Large-scale pretraining

Pre-trained on 300 million human images

Model Capabilities

Human image feature extraction

High-resolution image processing

Visual representation learning

Use Cases

Computer vision

Human pose estimation

Extracts human pose features from high-resolution images

Virtual avatar generation

Used for generating realistic digital human avatars

Property	Details
Image Size	1024 x 1024
Num Parameters	0.664 B
FLOPs	2.583 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1280
Num Layers	32
Num Heads	16
Feedforward Channels	5120

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Sapiens Pretrain 0.6b

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Pretrain-Sapiens-0.6B

🚀 Quick Start

✨ Features

📚 Documentation

Model Details

Model Card

More Resources

📄 License