D

Dit Large Finetuned Rvlcdip

Developed by microsoft
Document image classification model pretrained on IIT-CDIP and fine-tuned on RVL-CDIP, using Transformer architecture
Downloads 67
Release Time : 3/7/2022

Model Overview

This model is a Transformer encoder pretrained in a self-supervised manner on a large-scale document image collection, primarily used for tasks like document image classification

Model Features

Large-scale Pretraining
Pretrained on 42 million document images from IIT-CDIP dataset
Domain-specific Fine-tuning
Fine-tuned on RVL-CDIP document image dataset containing 16 categories
Transformer Architecture
Uses the same Transformer encoder architecture as BEiT
Self-supervised Learning
Pretrained using masked image patch prediction task

Model Capabilities

Document image classification
Document feature extraction
Image patch encoding

Use Cases

Document Processing
Document Classification
Classify document images into 16 predefined categories
Performs well on RVL-CDIP dataset
Table Detection
Identify table regions in documents
Document Layout Analysis
Analyze document layout structure
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase