Sapiens-seg-foreground-1b-torchscript Open Source Model - Easily Complete Foreground Person Segmentation Tasks

Sapiens Seg Foreground 1b Torchscript

Developed by facebook

Sapiens is a vision transformer model pre-trained on 300 million high-resolution human images, specifically designed for foreground person segmentation tasks.

Image Segmentation English#High-resolution portrait segmentation #Billion-parameter ViT #Real-world scenario generalization

Downloads 25

Release Time : 9/9/2024

Model Overview

This model is used to segment foreground figures from images, supports 1K high-resolution inference, and demonstrates outstanding generalization capabilities in real-world scenarios.

Model Features

High-resolution support

Natively supports 1K high-resolution inference with image dimensions up to 1024 x 768.

Large-scale pre-training

Pre-trained on 300 million human images at 1024 x 1024 resolution.

Exceptional generalization

Demonstrates excellent generalization on real data even with scarce annotations or fully synthetic conditions.

Model Capabilities

Foreground person segmentation

High-resolution image processing

Use Cases

Image editing

Person-background separation

Precisely separates foreground figures from the background in images.

Generates high-quality foreground segmentation results

Virtual reality

Avatar creation

Used to create virtual avatars based on real people.

🚀 Seg-Foreground-Background-Sapiens-1B-Torchscript

A vision transformer model for segmenting foreground humans from images, pretrained on a large - scale human image dataset.

🚀 Quick Start

The Seg-Foreground-Background-Sapiens-1B-Torchscript model is designed for human foreground segmentation tasks. It is based on the Sapiens family of vision transformers, which are pretrained on 300 million human images at 1024 x 1024 resolution.

✨ Features

High - Resolution Inference: Sapiens - 1B natively supports 1K high - resolution inference.
Good Generalization: The model generalizes well to in - the - wild conditions, even with scarce or synthetic labeled data.

📚 Documentation

Model Details

Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models, when finetuned for human - centric vision tasks, generalize to in - the - wild conditions. Sapiens - 1B natively support 1K high - resolution inference. The resulting models exhibit remarkable generalization to in - the - wild data, even when labeled data is scarce or entirely synthetic.

Property	Details
Developed by	Meta
Model Type	Vision Transformer
License	Creative Commons Attribution - NonCommercial 4.0
Task	seg
Format	torchscript
File	sapiens_1b_seg_foreground_epoch_8_torchscript.pt2

Model Card

Property	Details
Image Size	1024 x 768 (H x W)
Num Parameters	1.169 B
FLOPs	4.647 TFLOPs
Patch Size	16 x 16
Embedding Dimensions	1536
Num Layers	40
Num Heads	24
Feedforward Channels	6144

More Resources

Repository: https://github.com/facebookresearch/sapiens
Paper: https://arxiv.org/abs/2408.12569
Demo: https://huggingface.co/spaces/facebook/sapiens-seg
Project Page: https://about.meta.com/realitylabs/codecavatars/sapiens
Additional Results: https://rawalkhirodkar.github.io/sapiens
HuggingFace Collection: https://huggingface.co/collections/facebook/sapiens-66d22047daa6402d565cb2fc

💻 Usage Examples

Seg - Foreground 1B model can be used to segment foreground humans from images.

📄 License

This model is released under the Creative Commons Attribution - NonCommercial 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご