W

Webssl Dino7b Full8b 518

Developed by facebook
A 7-billion-parameter visual Transformer model trained on 8 billion MetaCLIP data using the DINOv2 self-supervised learning framework, requiring no language supervision
Downloads 157
Release Time : 4/25/2025

Model Overview

This is a visual Transformer model trained on web-scale image data through self-supervised learning, demonstrating that pure visual learning solutions can match or even surpass the performance of language-supervised models in various vision tasks

Model Features

Pure visual self-supervised learning
Completely language-free supervision, trained solely on web image data
Large-scale training data
Trained on 8 billion MetaCLIP web image samples
High-resolution processing
Supports high-resolution image input of 518×518 pixels
Multi-task adaptability
Outstanding performance in traditional vision benchmarks and multimodal tasks

Model Capabilities

Image feature extraction
Visual representation learning
Visual question answering
OCR recognition
Chart understanding

Use Cases

Computer vision
Image classification
Feature extraction for image classification tasks
Outstanding performance in traditional vision benchmarks
Object detection
Serves as a base feature extractor for object detection tasks
Multimodal applications
Visual question answering
Used for question-answering systems requiring image content understanding
Document understanding
Used for OCR and document layout analysis
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase