# OCR Enhancement
Webssl Mae700m Full2b 224
This is a 700M-parameter Vision Transformer model trained on 2 billion web images using masked autoencoder self-supervised learning, without language supervision.
Image Classification
Transformers

W
facebook
15
0
Turkish LLaVA V0.1
MIT
A Turkish visual-language model specifically designed for multimodal visual instruction-following tasks, capable of processing both visual (image) and text inputs to understand and execute instructions provided in Turkish.
Image-to-Text Other
T
ytu-ce-cosmos
86
10
Idefics3 8B Llama3
Apache-2.0
Idefics3 is an open-source multimodal model capable of processing arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.
Image-to-Text
Transformers English

I
HuggingFaceM4
45.86k
277
Pix2text Table Rec
MIT
A table structure recognition model developed based on Microsoft's Table Transformer for table detection and recognition tasks in documents
Text Recognition
Transformers

P
breezedeus
1,124
2
Donut Base Handwriting Recognition
MIT
Handwriting recognition model fine-tuned based on naver-clova-ix/donut-base
Text Recognition
Transformers

D
Cdywalst
140
1
Sampel2 Docqa Layoutlmv3 Base
A document Q&A model fine-tuned based on microsoft/layoutlmv2-base-uncased. The specific training dataset is unknown.
Question Answering System
Transformers

S
Tejagoud
10
0
Cogagent Vqa Hf
Apache-2.0
CogAgent is an open-source vision-language model based on CogVLM, focusing on single-round visual question answering tasks
Text-to-Image
Transformers English

C
THUDM
238
49
Cogagent Chat Hf
Apache-2.0
CogAgent is an open-source vision-language model based on CogVLM improvements, featuring GUI agent capabilities, multi-round visual dialogue, and visual grounding.
Text-to-Image
Transformers English

C
THUDM
503
69
Testdocumentquestionanswering
A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned for DocVQA tasks
Image-to-Text
Transformers

T
Dhineshk
16
0
Trained Model
This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased on the generator dataset, suitable for document understanding and layout analysis tasks.
Large Language Model
Transformers

T
vfu
14
0
Donut Receipt V3
MIT
Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated
Large Language Model
Transformers

D
mychen76
28
0
Layoutlmv2 Base Uncased Finetuned Docvqa
A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned specifically for document understanding tasks
Text-to-Image
Transformers

L
madiltalay
14
0
Layoutlmv2 Base Uncased Finetuned Docvqa
A document visual question answering model based on the LayoutLMv2 architecture, specifically fine-tuned for document understanding tasks
Image-to-Text
Transformers

L
hugginglaoda
16
0
Donut Base Sroie
MIT
A document understanding model fine-tuned from naver-clova-ix/donut-base, specialized in structured document information extraction tasks
Text Recognition
Transformers

D
enoreyes
15
0
Donut Base Bol
MIT
A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image folder datasets
Text Recognition
Transformers

D
prakriti42
13
0
Layoutlmv2 Base Uncased Finetuned Docvqa V2
This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased for document visual question answering tasks, focusing on processing text and layout information in document images.
Image-to-Text
Transformers

L
MariaK
54
3
Donut Base Sroie
MIT
A model fine-tuned on the image folder dataset based on naver-clova-ix/donut-base, suitable for document understanding tasks
Text Recognition
Transformers

D
zahra000
16
0
Donut Base Sroie
MIT
This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.
Text Recognition
Transformers

D
unstructuredio
31
1
Donut Base Medical Handwritten Blocks Data Extraction
MIT
A model based on the Donut architecture, specifically designed for extracting structured data from medical handwritten documents
Text Recognition
Transformers

D
mjawadazad2321
15
1
Donut Base Sroie
MIT
A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks
Text Recognition
Transformers

D
philschmid
185
3
Layoutlmv2 Base Uncased Finetuned Docvqa
A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned for document understanding tasks
Text-to-Image
Transformers

L
tiennvcs
983
14
Layoutlmv2 Large Uncased Finetuned Infovqa
Document understanding model based on the LayoutLMv2 architecture, fine-tuned for InfoVQA tasks
Question Answering System
Transformers

L
tiennvcs
16
2
Layoutlm Finetuned Funsd
This is a LayoutLM model fine-tuned on the FUNSD dataset, specifically designed for document/form tag classification tasks.
Text Recognition
Transformers

L
mrm8488
97
2
Featured Recommended AI Models