W

Webssl Mae700m Full2b 224

Developed by facebook
This is a 700M-parameter Vision Transformer model trained on 2 billion web images using masked autoencoder self-supervised learning, without language supervision.
Downloads 15
Release Time : 4/25/2025

Model Overview

Web-SSL MAE ViT-H is a large-scale visual representation learning model based on the Vision Transformer architecture, trained through self-supervised learning on massive web image data, suitable for various visual tasks.

Model Features

Large-scale Self-supervised Learning
Trained on 2 billion MetaCLIP web data without language supervision
High-performance Visual Representation
Excels in various visual tasks, particularly in OCR and chart understanding
Pure Visual Learning
Demonstrates that pure visual learning can match or surpass language-supervised models when properly scaled

Model Capabilities

Image Feature Extraction
Visual Representation Learning
OCR Recognition
Chart Understanding

Use Cases

Document Processing
OCR Text Recognition
Extract text content from images
Performs excellently in OCR tasks
Data Visualization
Chart Understanding
Analyze and interpret chart content
Outstanding performance in chart understanding tasks
General Visual Tasks
Image Classification
Classify image content
Remains competitive in traditional visual benchmarks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase