Sapiens-pretrain-1b Open-source Visual Model - Free Deployment Focused on Human-centered Visual Tasks

Sapiens Pretrain 1b

Developed by facebook

Sapiens is a vision Transformer model pretrained on 300 million high-resolution human images, focusing on human-centric vision tasks.

Downloads 48

Release Time : 9/10/2024

Model Overview

Sapiens-1B is a 1-billion-parameter vision Transformer model, pretrained on a large-scale human image dataset, supporting 1K high-resolution inference and demonstrating exceptional generalization capabilities even with scarce labeled data or fully synthetic conditions.

Model Features

High-resolution processing

Native support for 1024×1024 resolution image input, preserving rich visual details

Data efficiency

Maintains strong performance even with scarce labeled data or fully synthetic data

Large-scale pretraining

Pretrained on 300 million human images, learning rich human feature representations

Real-world generalization

After fine-tuning for human-centric vision tasks, effectively generalizes to real-world scenarios

Model Capabilities

Human image feature extraction

High-resolution image processing

Visual representation learning

Transfer learning foundation model

Use Cases

Computer vision

Human pose analysis

Extracts human pose features from high-resolution images

Virtual avatar generation

Serves as the foundation model for the Codec Avatar project, supporting high-fidelity virtual avatar generation

Medical imaging

Medical image analysis

Assists in human feature extraction and analysis from medical images

🚀 Pretrain-Sapiens-1B

Sapiens is a family of vision transformers. It's pretrained on 300 million human images with a resolution of 1024 x 1024. When finetuned for human - centric vision tasks, these models can generalize well to real - world conditions. The Sapiens - 1B model natively supports 1K high - resolution inference and shows excellent generalization even with scarce labeled data or entirely synthetic data.

🚀 Quick Start

The pretrained 1B model can be used for feature extraction, fine - tuning, or as a starting point for training new models.

✨ Features

Developed by Meta.
The Sapiens - 1B model natively supports 1K high - resolution inference.
The models generalize well to in - the - wild conditions, especially useful when labeled data is scarce or synthetic.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Developed by: Meta
Model type: Vision Transformer
License: Creative Commons Attribution - NonCommercial 4.0
Task: pretrain
Format: original
File: sapiens_1b_epoch_173_clean.pth

Model Card

Property	Details
Image Size	1024 x 1024
Num Parameters	1.169 B
FLOPs	4.647 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1536
Num Layers	40
Num Heads	24
Feedforward Channels	6144

More Resources

Repository: https://github.com/facebookresearch/sapiens
Paper: https://arxiv.org/abs/2408.12569
Demo: Sapiens Gradio Spaces
Project Page: https://about.meta.com/realitylabs/codecavatars/sapiens
Additional Results: https://rawalkhirodkar.github.io/sapiens
HuggingFace Collection: https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc

🔧 Technical Details

No detailed technical implementation information (more than 50 words) is provided in the original document, so this section is skipped.

📄 License

The model is licensed under the Creative Commons Attribution - NonCommercial 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご