Model Selection

OCR Enhancement

# OCR Enhancement

Webssl Mae700m Full2b 224

This is a 700M-parameter Vision Transformer model trained on 2 billion web images using masked autoencoder self-supervised learning, without language supervision.

Image Classification

Turkish LLaVA V0.1

A Turkish visual-language model specifically designed for multimodal visual instruction-following tasks, capable of processing both visual (image) and text inputs to understand and execute instructions provided in Turkish.

Image-to-Text Other

Idefics3 8B Llama3

Idefics3 is an open-source multimodal model capable of processing arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.

Transformers English

Pix2text Table Rec

A table structure recognition model developed based on Microsoft's Table Transformer for table detection and recognition tasks in documents

Text Recognition

Donut Base Handwriting Recognition

Handwriting recognition model fine-tuned based on naver-clova-ix/donut-base

Text Recognition

Sampel2 Docqa Layoutlmv3 Base

A document Q&A model fine-tuned based on microsoft/layoutlmv2-base-uncased. The specific training dataset is unknown.

Question Answering System

Cogagent Vqa Hf

CogAgent is an open-source vision-language model based on CogVLM, focusing on single-round visual question answering tasks

Transformers English

Cogagent Chat Hf

CogAgent is an open-source vision-language model based on CogVLM improvements, featuring GUI agent capabilities, multi-round visual dialogue, and visual grounding.

Transformers English

Testdocumentquestionanswering

A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned for DocVQA tasks

This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased on the generator dataset, suitable for document understanding and layout analysis tasks.

Large Language Model

Donut Receipt V3

Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated

Large Language Model

Layoutlmv2 Base Uncased Finetuned Docvqa

A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned specifically for document understanding tasks

Layoutlmv2 Base Uncased Finetuned Docvqa

A document visual question answering model based on the LayoutLMv2 architecture, specifically fine-tuned for document understanding tasks

Donut Base Sroie

A document understanding model fine-tuned from naver-clova-ix/donut-base, specialized in structured document information extraction tasks

Text Recognition

A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image folder datasets

Text Recognition

Layoutlmv2 Base Uncased Finetuned Docvqa V2

This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased for document visual question answering tasks, focusing on processing text and layout information in document images.

Donut Base Sroie

A model fine-tuned on the image folder dataset based on naver-clova-ix/donut-base, suitable for document understanding tasks

Text Recognition

Donut Base Sroie

This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.

Text Recognition

Donut Base Medical Handwritten Blocks Data Extraction

A model based on the Donut architecture, specifically designed for extracting structured data from medical handwritten documents

Text Recognition

Donut Base Sroie

A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks

Text Recognition

Layoutlmv2 Base Uncased Finetuned Docvqa

A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned for document understanding tasks

Layoutlmv2 Large Uncased Finetuned Infovqa

Document understanding model based on the LayoutLMv2 architecture, fine-tuned for InfoVQA tasks

Question Answering System

Layoutlm Finetuned Funsd

This is a LayoutLM model fine-tuned on the FUNSD dataset, specifically designed for document/form tag classification tasks.

Text Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase