D

Dino Vitb8

Developed by facebook
A Vision Transformer model trained with self-supervised DINO method using 8x8 image patches, suitable for image feature extraction tasks
Downloads 1,664
Release Time : 3/2/2022

Model Overview

This model is a Vision Transformer (ViT) pretrained on the ImageNet-1k dataset using the DINO self-supervised method, primarily for image representation learning and can serve as a feature extractor for downstream vision tasks

Model Features

Self-supervised learning
Uses DINO self-supervised learning method to learn image features without manual annotation
8x8 image patch processing
Divides images into 8x8 pixel patches for processing, suitable for capturing local features
Transformer architecture
Based on Transformer encoder architecture with powerful feature extraction capabilities

Model Capabilities

Image feature extraction
Image representation learning
Transfer learning for downstream vision tasks

Use Cases

Computer vision
Image classification
Fine-tune by adding a classification head on top of the pretrained model
Object detection
Used as a feature extractor for object detection tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase