W

Webssl Dino2b Light2b 224

Developed by facebook
A 2-billion-parameter vision Transformer model trained using the DINOv2 self-supervised learning framework on lightly filtered web-scale image data (without language supervision).
Downloads 27
Release Time : 4/25/2025

Model Overview

This model is trained via self-supervised learning on lightly filtered web image data, focusing on pure visual representation learning. It is suitable for various vision tasks and excels particularly in OCR and chart understanding.

Model Features

Pure Visual Learning
Self-supervised training using only image data, without language supervision.
Lightly Filtered Data
Uses a lightly filtered subset of MetaCLIP data (retaining ~50.3% of original data), balancing data quality and diversity.
Large-Scale Parameters
A 2-billion-parameter vision Transformer architecture providing powerful representation capabilities.
OCR and Chart Understanding Advantage
Enhances OCR and chart understanding while maintaining performance across all vision tasks.

Model Capabilities

Image feature extraction
Visual representation learning
OCR tasks
Chart understanding

Use Cases

Computer Vision
Image Classification
Utilizes image features extracted by the model for classification tasks.
Object Detection
Performs object localization and recognition using the model's patch token features.
Document Analysis
OCR Recognition
Identifies text content in images.
Significant improvement compared to other vision models
Chart Understanding
Interprets charts and data visualizations in images.
Outperforms language-supervised models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase