P

PE Core L14 336

Developed by facebook
A large-scale visual encoder model developed by Meta, achieving state-of-the-art performance in various vision tasks through contrastive pre-training and fine-tuning on synthetic video data
Downloads 11.52k
Release Time : 4/11/2025

Model Overview

The Perception Encoder is a series of advanced image and video understanding encoders that employ a robust contrastive pre-training scheme and are fine-tuned on synthetically aligned videos. It outperforms existing models in classification and retrieval tasks, with internally generated features exhibiting strong generalization capabilities.

Model Features

Internal Feature Generalization
The internally generated features possess strong generalization capabilities, extendable to various downstream tasks.
Alignment Tuning Technology
Unlocks the transfer potential of large-scale contrastive pre-training through alignment tuning, fully leveraging universal features.
Multi-scale Performance
Offers three scales (B/16, L/14, G/14) to meet different computational needs.

Model Capabilities

Zero-shot image classification
Zero-shot video classification
Image-text retrieval
Video-text retrieval
Cross-modal feature extraction

Use Cases

Visual Content Understanding
Image Classification
Accurately classifies images without fine-tuning.
Achieves 85.4% accuracy on ImageNet-1k.
Cross-modal Retrieval
Enables efficient retrieval between images/videos and text.
Achieves 58.1% recall on COCO-T2I.
Video Analysis
Video Action Recognition
Identifies action categories in videos.
Achieves 76.9% accuracy on Kinetics-400.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase