Mlcd Vit Bigg Patch14 448
M
Mlcd Vit Bigg Patch14 448
Developed by DeepGlint-AI
MLCD-ViT-bigG is an advanced Vision Transformer model enhanced with 2D Rotary Position Encoding (RoPE2D), excelling in document understanding and visual question answering tasks.
Downloads 1,517
Release Time : 2/12/2025
Model Overview
Developed by DeepGlint AI, this model employs a Vision Transformer architecture enhanced with 2D Rotary Position Encoding (RoPE2D), specifically designed for complex vision-language interaction tasks, demonstrating outstanding performance in document understanding and visual question answering.
Model Features
2D Rotary Position Encoding (RoPE2D)
Incorporates innovative 2D rotary position encoding technology, enhancing the model's ability to understand spatial position information
Exceptional Document Understanding
Outperforms peer models in document understanding and visual question answering tasks
High-Resolution Processing
Supports 448px high-resolution image input, capturing finer visual features
Model Capabilities
Image Feature Extraction
Document Understanding
Visual Question Answering
Chart Analysis
OCR Enhancement
Use Cases
Document Processing
Document Question Answering
Extract information from complex documents and answer questions
Achieves 83.34% accuracy on the DocVQA dataset
Table Understanding
Parse and understand tabular data in documents
Visual Question Answering
Chart Analysis
Understand and answer questions about charts
Achieves 73.80% accuracy on the ChartQA dataset
Information Extraction
Extract structured information from images
Achieves 46.59% accuracy on the InfoVQA dataset
Featured Recommended AI Models
Š 2025AIbase