Webssl Dino7b Full8b 518
W
Webssl Dino7b Full8b 518
Developed by facebook
A 7-billion-parameter visual Transformer model trained on 8 billion MetaCLIP data using the DINOv2 self-supervised learning framework, requiring no language supervision
Downloads 157
Release Time : 4/25/2025
Model Overview
This is a visual Transformer model trained on web-scale image data through self-supervised learning, demonstrating that pure visual learning solutions can match or even surpass the performance of language-supervised models in various vision tasks
Model Features
Pure visual self-supervised learning
Completely language-free supervision, trained solely on web image data
Large-scale training data
Trained on 8 billion MetaCLIP web image samples
High-resolution processing
Supports high-resolution image input of 518Ã518 pixels
Multi-task adaptability
Outstanding performance in traditional vision benchmarks and multimodal tasks
Model Capabilities
Image feature extraction
Visual representation learning
Visual question answering
OCR recognition
Chart understanding
Use Cases
Computer vision
Image classification
Feature extraction for image classification tasks
Outstanding performance in traditional vision benchmarks
Object detection
Serves as a base feature extractor for object detection tasks
Multimodal applications
Visual question answering
Used for question-answering systems requiring image content understanding
Document understanding
Used for OCR and document layout analysis
Featured Recommended AI Models
Š 2025AIbase