Open-source Sapiens-seg-1b-torchscript model - Ideal for human-centric vision tasks with strong generalization ability

Sapiens Seg 1b Torchscript

Developed by facebook

Sapiens is a series of vision transformers pre-trained on 300 million 1024×1024 resolution human images, specifically designed for human-centric vision tasks with exceptional generalization capabilities.

Image Segmentation English#High-resolution human segmentation #28 body part categories #ViT large model

Downloads 892

Release Time : 9/9/2024

Model Overview

This model is a 1.169 billion parameter vision transformer, fine-tuned for high-resolution image segmentation tasks across 28 human body part categories.

Model Features

High-resolution support

Natively supports 1K high-resolution inference (1024×768), ideal for precise human body part segmentation.

Strong generalization capability

Demonstrates exceptional generalization to real-world data even with scarce annotations or fully synthetic scenarios.

Large-scale pre-training

Pre-trained on 300 million 1024×1024 resolution human images, featuring rich visual representation capabilities.

Model Capabilities

Human image segmentation

28 body part recognition

High-resolution image processing

Use Cases

Medical imaging

Surgical planning assistance

Used for precise segmentation and visualization of human body parts pre-surgery

Improves surgical planning accuracy

Virtual try-on

Virtual garment fitting

Accurate body part segmentation for more realistic virtual try-on effects

Enhances e-commerce user experience

🚀 Seg-Sapiens-1B-Torchscript

Seg-Sapiens-1B-Torchscript is a vision transformer model for human image segmentation, offering high - resolution inference and strong generalization capabilities.

🚀 Quick Start

This section will guide you through the basic information of the Seg - Sapiens - 1B - Torchscript model.

✨ Features

Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. When finetuned for human - centric vision tasks, these models can generalize well to in - the - wild conditions.
Sapiens - 1B natively supports 1K high - resolution inference. The models show remarkable generalization to in - the - wild data, even when labeled data is scarce or entirely synthetic.
The Seg 1B model can perform 28 - class body part segmentation on human images.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

No code examples are provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Developed by: Meta
Model type: Vision Transformer
License: Creative Commons Attribution - NonCommercial 4.0
Task: seg
Format: torchscript
File: sapiens_1b_goliath_best_goliath_mIoU_7994_epoch_151_torchscript.pt2

Model Card

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	1.169 B
FLOPs	4.647 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1536
Num Layers	40
Num Heads	24
Feedforward Channels	6144

More Resources

Repository: https://github.com/facebookresearch/sapiens
Paper: https://arxiv.org/abs/2408.12569
Demo: https://huggingface.co/spaces/facebook/sapiens-seg
Project Page: https://about.meta.com/realitylabs/codecavatars/sapiens
Additional Results: https://rawalkhirodkar.github.io/sapiens
HuggingFace Collection: https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc

🔧 Technical Details

No detailed technical implementation information is provided in the original document, so this section is skipped.

📄 License

The model is licensed under the Creative Commons Attribution - NonCommercial 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご