W

Webssl Dino3b Heavy2b 224

Developed by facebook
A 3-billion parameter vision Transformer model trained on 2 billion carefully curated MetaCLIP data using DINOv2 self-supervised learning framework
Downloads 26
Release Time : 4/25/2025

Model Overview

This is a vision Transformer model trained through self-supervised learning, specializing in image understanding tasks, particularly adept at processing charts and document images containing text

Model Features

Curated data training
Trained using only 1.3% of the original MetaCLIP dataset - a high-quality subset particularly containing charts, tables and document images with readable text
Self-supervised learning
Trained using DINOv2 framework, learning powerful visual representations without language supervision
Massive parameters
3-billion parameter vision Transformer architecture capable of capturing complex visual features
OCR-enhanced
Optimized for text and chart understanding, significantly improving OCR capabilities while maintaining performance on other vision tasks

Model Capabilities

Image feature extraction
Visual representation learning
Chart understanding
Document image analysis
OCR-related tasks

Use Cases

Document processing
Table recognition
Extracting table structures and contents from scanned documents
High-precision table recognition capability
Chart understanding
Analyzing chart images and extracting key information
Accurate chart content parsing
Computer vision
Image retrieval
Image search based on visual features
Efficient image similarity matching
Visual representation learning
Providing pretrained visual features for downstream tasks
Strong transfer learning capability
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase