P

Pvt Tiny 224

Developed by Xrenya
Pyramid Vision Transformer (PVT) is a vision model based on transformer architecture, specifically designed for image classification tasks.
Downloads 25
Release Time : 3/25/2023

Model Overview

This model is pretrained and fine-tuned on the ImageNet-1K dataset, capable of classifying images into 1000 categories. It adopts a pyramid structure to reduce computational costs, making it suitable for dense prediction tasks.

Model Features

Pyramid Structure
Uses a progressive shrinking pyramid to reduce computational costs and improve efficiency in processing large feature maps.
Transformer Encoder
Based on transformer architecture, captures global image information through self-attention mechanisms.
CLS Token Classification
Uses the [CLS] token as a holistic representation of the image, facilitating classification tasks.

Model Capabilities

Image Classification
Feature Extraction

Use Cases

Computer Vision
Image Classification
Classifies input images into 1000 ImageNet categories.
Performs well on the ImageNet-1K dataset.
Feature Extraction
Extracts image features for downstream tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase