Webssl Dino2b Full2b 224
W
Webssl Dino2b Full2b 224
Developed by facebook
A 2-billion parameter vision Transformer model trained on 2 billion web images through pure visual self-supervised learning, excelling in multimodal tasks
Downloads 50
Release Time : 4/25/2025
Model Overview
This is a 2-billion parameter vision Transformer model trained using the DINOv2 self-supervised learning framework, requiring no language supervision, achieving performance on par with or surpassing language-supervised models in various vision tasks
Model Features
Pure visual self-supervised learning
No language supervision required, trained solely on visual data
Large-scale training
Trained on 2 billion web image samples
High performance
Excellent performance on traditional vision benchmarks and multimodal tasks
Dual attention implementation
Supports both 'eager' and 'sdpa' attention implementations
Model Capabilities
Image feature extraction
Visual representation learning
Multimodal task processing
Visual question answering
OCR recognition
Chart understanding
Use Cases
Computer vision
Image classification
Utilizing image features extracted by the model for classification tasks
Performance on par with or surpassing language-supervised models
Object detection
Object localization through the model's patch token features
Multimodal applications
Visual question answering
Combining with language models to answer questions about image content
Excellent performance
Chart understanding
Parsing and understanding visual information in charts
Featured Recommended AI Models
Š 2025AIbase