P

Pvt Medium 224

Developed by Xrenya
PVT is a Transformer-based vision model that employs a pyramid structure for image processing, pre-trained on ImageNet-1K, suitable for image classification tasks.
Downloads 13
Release Time : 3/27/2023

Model Overview

This model is a convolution-free vision Transformer architecture that reduces computational costs through a progressive pyramid structure, primarily designed for image classification tasks.

Model Features

Pyramid Structure Design
Adopts a progressively shrinking pyramid structure to effectively reduce computation for large feature maps
Convolution-Free Architecture
Completely based on Transformer encoders, independent of traditional convolution operations
Global Context Modeling
Captures global image feature representations through [CLS] tokens

Model Capabilities

Image Classification
Feature Extraction

Use Cases

Computer Vision
General Image Classification
Classifies images into 1000 ImageNet categories
Performs well on the ImageNet-1K dataset
Feature Extraction for Downstream Tasks
Serves as a backbone network to provide features for other vision tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase