V

Vit Large Patch14 Dinov2.lvd142m

Developed by pcuenq
A vision Transformer (ViT)-based image feature model, pre-trained on the LVD-142M dataset using the self-supervised DINOv2 method.
Downloads 18
Release Time : 1/21/2025

Model Overview

This is a large vision Transformer model primarily used for image feature extraction and image classification tasks. The model employs the DINOv2 self-supervised learning method for pre-training on the LVD-142M dataset, capable of generating high-quality image representations.

Model Features

Self-supervised pre-training
Pre-trained on the LVD-142M dataset using the DINOv2 self-supervised learning method, eliminating the need for manually labeled data
Large-scale vision Transformer
Based on the ViT-Large architecture with 304.4 million parameters, capable of processing high-resolution images
High-resolution processing capability
Supports high-resolution image inputs up to 518ร—518 pixels

Model Capabilities

Image feature extraction
Image classification
Image representation learning

Use Cases

Computer vision
Image classification
Can be used for various image classification tasks such as object recognition, scene classification, etc.
Image retrieval
Utilizes extracted image features for similar image retrieval
Visual representation learning
Serves as a foundational model for other vision tasks such as object detection, segmentation, etc.
Featured Recommended AI Models
ยฉ 2025AIbase