W

Webssl Mae1b Full2b 224

Developed by facebook
A 1-billion-parameter Vision Transformer model trained via masked autoencoder self-supervised learning on 2 billion web images, capable of learning visual representations without language supervision.
Downloads 36
Release Time : 4/25/2025

Model Overview

This model demonstrates that pure visual learning methods can match or surpass language-supervised models in various vision tasks, particularly excelling in OCR and chart understanding tasks.

Model Features

Large-Scale Self-Supervised Learning
Trained on 2 billion web images without any language supervision
Efficient Visual Representation
Outperforms language-supervised models in tasks like OCR and chart understanding
Pure Visual Architecture
Utilizes ViT architecture focused on visual information processing

Model Capabilities

Image Feature Extraction
Visual Representation Learning
OCR Task Processing
Chart Understanding

Use Cases

Document Processing
Optical Character Recognition (OCR)
Extract text information from images
Superior recognition accuracy compared to language-supervised models
Data Visualization
Chart Understanding
Parse data and relationships in charts
Demonstrates outstanding comprehension capabilities
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase