V

Vision Perceiver Fourier

Developed by deepmind
Perceiver IO is a general-purpose Transformer architecture capable of processing multiple modalities. This model is specifically designed for image classification tasks and pretrained on the ImageNet dataset.
Downloads 1,168
Release Time : 3/2/2022

Model Overview

This model employs cross-attention mechanisms to process raw pixel values without image patching, achieving efficient image classification through fixed Fourier position embeddings.

Model Features

Modality-Agnostic Architecture
Core design applicable to various data types including text, images, and audio.
Efficient Attention Mechanism
Achieves input-size-independent self-attention computational complexity through latent vectors.
Raw Pixel Processing
Directly processes raw pixel values without ViT-style image patching preprocessing.
Flexible Decoding
Supports multiple output formats and tasks through decoding query mechanisms.

Model Capabilities

Image Classification
Feature Extraction

Use Cases

Computer Vision
Image Classification
Performs 1000-class ImageNet classification on input images.
79.0 top-1 accuracy on ImageNet-1k
Transfer Learning
Used as a pretrained model for downstream vision tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase