V

Vit Large Patch14 Dinov2.lvd142m

Developed by timm
A self-supervised image feature model based on Vision Transformer (ViT), pre-trained using the DINOv2 method on the LVD-142M dataset, suitable for image classification and feature extraction tasks.
Downloads 32.01k
Release Time : 5/9/2023

Model Overview

This model is an image feature extraction model based on the Vision Transformer architecture, pre-trained on a large dataset through self-supervised learning, capable of generating high-quality image feature representations for various computer vision tasks.

Model Features

Self-supervised pre-training
Pre-trained using the DINOv2 self-supervised learning method on the LVD-142M dataset, eliminating the need for manually labeled data.
Large-scale model
A large Vision Transformer architecture with 304.4 million parameters, capable of capturing rich image features.
High-resolution processing
Supports high-resolution image input of 518ร—518 pixels, suitable for processing visually detailed content.

Model Capabilities

Image feature extraction
Image classification
Visual representation learning

Use Cases

Computer vision
Image classification
Can be used to classify image content, supporting top5 prediction results.
Feature extraction
Can extract high-quality image embedding features for downstream visual tasks.
Featured Recommended AI Models
ยฉ 2025AIbase