V

Vit Large Patch14 Reg4 Dinov2.lvd142m

Developed by timm
A Vision Transformer (ViT) image feature model with registers, pre-trained using self-supervised DINOv2 method on the LVD-142M dataset.
Downloads 119.48k
Release Time : 10/30/2023

Model Overview

This model is an image feature extraction model based on the Vision Transformer (ViT) architecture, primarily used for image classification and feature extraction tasks. Pre-trained through self-supervised learning on large datasets, it can extract high-quality image features.

Model Features

Register Enhancement
The model incorporates a register mechanism, enhancing the performance of the Vision Transformer, particularly in handling image backgrounds and irrelevant information.
Self-supervised Pre-training
Pre-trained using the DINOv2 self-supervised learning method on the LVD-142M dataset, capable of learning powerful visual features without manual annotations.
Large Input Size Support
Supports large image inputs of 518x518 pixels, enabling the capture of richer visual details.

Model Capabilities

Image feature extraction
Image classification
Visual representation learning

Use Cases

Computer Vision
Image Classification
Can be used for general image classification tasks such as object recognition and scene classification.
Feature Extraction
Can serve as a backbone network for other vision tasks, providing high-quality image feature representations.
Featured Recommended AI Models
ยฉ 2025AIbase