D

Dit Base Finetuned Rvlcdip

Developed by microsoft
DiT is a Transformer-based document image classification model, pretrained on the IIT-CDIP dataset and fine-tuned on the RVL-CDIP dataset
Downloads 31.99k
Release Time : 3/7/2022

Model Overview

This model is pretrained on a large number of document images through self-supervised learning, primarily for document image classification tasks, capable of encoding document images into vector representations

Model Features

Self-supervised Pretraining
Pretrained on large-scale document images using masked image patch prediction tasks
Document Image Classification
Classification capability specifically optimized for document images, supporting 16 document categories
Transformer Architecture
Adopts the same Transformer architecture as BEiT, suitable for processing image data

Model Capabilities

Document Image Classification
Document Feature Extraction
Image Encoding

Use Cases

Document Management
Automatic Document Classification
Automatically classifies scanned documents into 16 categories such as advertisements, scientific publications, etc.
Performs well on the RVL-CDIP dataset
Information Extraction
Document Layout Analysis
Identifies different regions and structures within documents
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase