V

Vit Small Patch16 224.dino

Developed by timm
An image feature model based on Vision Transformer (ViT), trained using the self-supervised DINO method, suitable for image classification and feature extraction tasks.
Downloads 70.62k
Release Time : 12/22/2022

Model Overview

This model is an image feature model based on Vision Transformer (ViT), trained with the self-supervised DINO method. It is primarily used for image classification and as a feature backbone network, applicable to various computer vision tasks.

Model Features

Self-supervised learning
Trained using the DINO self-supervised learning method, capable of learning effective visual representations without extensive labeled data.
Efficient architecture
Based on the Vision Transformer architecture with 21.7M parameters and 4.3 GMACs computational load, suitable for medium-scale computing needs.
Multi-task support
Can be used for both image classification and as a feature extraction backbone network, supporting various downstream computer vision tasks.

Model Capabilities

Image feature extraction
Image classification
Computer vision task support

Use Cases

Computer vision
Image classification
Classifies input images and outputs probability distributions of categories.
Performs well on the ImageNet-1k dataset
Feature extraction
Extracts deep feature representations of images for downstream tasks such as object detection and image retrieval.
Provides 384-dimensional feature vectors
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase