P

PE Spatial G14 448

Developed by facebook
The Perception Encoder (PE) is a state-of-the-art image and video understanding encoder trained through simple vision-language learning.
Downloads 3,256
Release Time : 4/11/2025

Model Overview

The Perception Encoder (PE) is a series of large-scale vision encoder models that achieve state-of-the-art performance across various vision tasks. By employing a robust contrastive pre-training scheme and fine-tuning on synthetically aligned videos, PE not only surpasses all existing models in classification and retrieval tasks but also generates powerful, generalizable features internally that can be extended for downstream tasks.

Model Features

Intermediate Feature Extraction
Extracts powerful features from intermediate layers of the model rather than the output layer, providing superior visual embeddings.
SAM Optimization
Optimized using SAM 2.1's mask-based learning strategy to enhance performance in dense prediction tasks.
Fine Semantic Correspondence
The feature space exhibits fine semantic correspondences, enabling the identification of relationships between object parts.

Model Capabilities

Image feature extraction
Dense prediction task processing
Semantic correspondence analysis
Visual understanding

Use Cases

Computer Vision
Image Classification
Used for image classification tasks
Achieves state-of-the-art performance across various vision tasks
Object Detection
Used for dense prediction tasks such as object detection
Performs exceptionally well on ADE20k, LVIS, and COCO datasets
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase