V

Vit Base Patch16 224.dino

Developed by timm
A Vision Transformer (ViT) image feature model trained with self-supervised DINO method, suitable for image classification and feature extraction tasks.
Downloads 33.45k
Release Time : 12/22/2022

Model Overview

This model is a Vision Transformer trained with the DINO self-supervised learning method, primarily used for image classification and as a backbone network for feature extraction.

Model Features

Self-supervised Learning
Uses the DINO method for self-supervised training, enabling effective visual representation learning without extensive labeled data.
Vision Transformer Architecture
Adopts the standard ViT-B/16 architecture, processing images by dividing them into 16x16 patches.
Efficient Feature Extraction
Can serve as a backbone network for feature extraction, outputting 768-dimensional feature vectors.

Model Capabilities

Image Classification
Image Feature Extraction
Visual Representation Learning

Use Cases

Computer Vision
Image Classification
Classifies images and outputs probabilities for categories in ImageNet-1k.
Feature Extraction
Extracts high-level image features for downstream tasks such as object detection and image retrieval.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase