W

Webssl Dino1b Full2b 224

Developed by facebook
This is a 1-billion-parameter vision Transformer model trained on 2 billion web images through DINOv2 self-supervised learning, capable of learning visual representations without language supervision.
Downloads 1,172
Release Time : 4/25/2025

Model Overview

This model demonstrates that pure visual learning can match or exceed the performance of language-supervised models when scaled appropriately, suitable for various visual tasks.

Model Features

Large-scale self-supervised learning
Trained on 2 billion web images without language supervision
High-performance visual representation
Achieves or exceeds the performance of language-supervised models on various visual tasks
Efficient architecture design
Utilizes ViT architecture with width 1536, depth 40, and 24 heads

Model Capabilities

Image feature extraction
Visual representation learning
Image classification
Object detection

Use Cases

Computer Vision
Image classification
Using the model's extracted image features for classification tasks
Object detection
Leveraging the visual representations learned by the model for object detection
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase