D

Deit Base Distilled Patch16 384

Developed by facebook
A distilled vision transformer model, pre-trained at 224x224 resolution and fine-tuned on ImageNet-1k at 384x384 resolution, learning from a teacher model via distillation tokens.
Downloads 1,824
Release Time : 3/2/2022

Model Overview

This model is a distilled vision transformer (ViT) for image classification tasks. It learns from a teacher CNN model using distillation tokens and supports high-resolution image processing.

Model Features

Distillation Learning
Learns from a teacher CNN model using distillation tokens to improve model performance.
High-Resolution Support
Supports 384x384 resolution image processing to enhance classification accuracy.
Data Efficiency
Pre-trained and fine-tuned on ImageNet-1k, ensuring efficient data usage.

Model Capabilities

Image Classification
High-Resolution Image Processing

Use Cases

Computer Vision
ImageNet Image Classification
Classifies images into one of the 1000 ImageNet categories.
Top-1 accuracy 85.2%, Top-5 accuracy 97.2%.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase