W

Webssl Dino2b Heavy2b 224

Developed by facebook
A 2-billion parameter self-supervised vision Transformer model trained on rigorously filtered web-scale image data, specially optimized for chart and text understanding
Downloads 24
Release Time : 4/25/2025

Model Overview

This is a vision Transformer model trained via self-supervised learning on carefully filtered web-scale image data, specifically optimized for charts, tables, and readable text documents, demonstrating excellent performance in OCR and chart understanding tasks

Model Features

Rigorously filtered training data
Trained on a high-quality subset comprising only 1.3% of the original MetaCLIP dataset, specifically including charts, tables, and readable text documents
Self-supervised learning
Utilizes DINOv2 self-supervised learning approach, capable of learning powerful visual representations without language supervision
Large-scale parameters
2-billion parameter vision Transformer architecture providing powerful feature extraction capabilities
Optimized OCR capabilities
Specially optimized for text and chart understanding, showing outstanding performance in related tasks

Model Capabilities

Image feature extraction
Visual representation learning
Chart understanding
Text detection
Table recognition

Use Cases

Document processing
Table recognition
Extracting table structure and content from images
High-precision table detection and recognition
OCR enhancement
Improving text recognition accuracy in images
Improved text recognition performance in complex backgrounds
Visual understanding
Chart analysis
Understanding various chart types and data in images
Accurate chart classification and data extraction
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase