V

Vision Perceiver Learned

Developed by deepmind
A general-purpose vision perceiver model pre-trained on ImageNet, using learned position embeddings to process image inputs
Downloads 1,894
Release Time : 3/2/2022

Model Overview

This model is a Transformer encoder applicable to any modality, specifically designed for image classification tasks, capable of learning image representations directly from pixel values

Model Features

Modality-Agnostic Architecture
Applicable to various data modalities including text, images, audio, and video
Efficient Attention Mechanism
Uses latent vectors to reduce computational complexity, making the attention mechanism independent of input size constraints
Learned Position Embeddings
Uses only learned 1D position embeddings without relying on prior knowledge of image 2D structure
Flexible Decoding Mechanism
Can decode latent vectors into outputs of arbitrary size and semantics through a decoding query mechanism

Model Capabilities

Image Classification
Feature Extraction

Use Cases

Computer Vision
Image Classification
Classify input images into 1000 categories
Achieves 72.7% Top-1 accuracy on ImageNet-1k
Feature Extraction
Extract image features for downstream tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase