V

Vit Base Patch14 Dinov2.lvd142m

Developed by timm
A Vision Transformer (ViT)-based image feature model, pre-trained using self-supervised DINOv2 method on the LVD-142M dataset
Downloads 50.71k
Release Time : 5/9/2023

Model Overview

This model serves as a backbone network for image classification and feature extraction, employing the Vision Transformer architecture. Pre-trained on a large dataset through self-supervised learning, it can extract high-quality image feature representations.

Model Features

Self-supervised pre-training
Pre-trained using the DINOv2 self-supervised learning method on the LVD-142M dataset, eliminating the need for manually annotated data
Large-scale image processing
Supports large image inputs of 518×518 pixels, capable of capturing richer visual information
Efficient feature extraction
The model is optimized for computational efficiency with GMACs operations at 151.7, making it suitable as a feature extraction backbone network

Model Capabilities

Image feature extraction
Image classification
Visual representation learning

Use Cases

Computer vision
Image classification
Can be used for various image classification tasks such as object recognition, scene classification, etc.
Feature extraction
Can serve as a backbone network for other vision tasks to extract high-quality image feature representations
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase