D

Dino Vitb16

Developed by facebook
A Vision Transformer model trained using the DINO self-supervised method, based on the ViT architecture and pretrained on the ImageNet-1k dataset.
Downloads 122.46k
Release Time : 3/2/2022

Model Overview

This model is pretrained on the ImageNet-1k dataset through self-supervised learning and can extract image features for downstream vision tasks. It uses 16×16 image patch segmentation and does not include a fine-tuning head.

Model Features

Self-supervised Learning
Uses the DINO method for self-supervised training, learning image features without manual annotations.
ViT Architecture
Processes images using a Transformer encoder architecture, segmenting images into 16×16 pixel patches for sequential processing.
General Feature Extraction
The pretrained model can extract general image features suitable for various downstream vision tasks.

Model Capabilities

Image Feature Extraction
Image Classification (requires adding a classification head)
Visual Representation Learning

Use Cases

Computer Vision
Image Classification
Add a linear layer on top of the model for image classification tasks.
Feature Extraction
Extract image features for downstream tasks such as object detection and segmentation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase