LayoutLMv2 Open-Source Document Visual Question Answering Model - Free Deployment to Assist Document Understanding Tasks

Layoutlmv2 Base Uncased Finetuned Docvqa

Developed by madiltalay

A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned specifically for document understanding tasks

Text-to-Image

Transformers

#Document Visual Question Answering #Multimodal Understanding #Layout Awareness

Downloads 14

Release Time : 6/22/2023

Model Overview

This model is a fine-tuned version of LayoutLMv2 base on the DocVQA task, capable of understanding document layouts and text content to answer questions about documents.

Model Features

Multimodal Understanding Capability

Processes both textual content and document layout information simultaneously

Document-Specific Optimization

Specially fine-tuned for document visual question answering tasks

End-to-End Training

Learns text and visual features directly from raw document images

Model Capabilities

Document Understanding

Visual Question Answering

Text Localization

Layout Analysis

Use Cases

Document Processing

Form Information Extraction

Extracts specific field information from structured documents

Document Q&A System

Answers natural language questions about document content

Enterprise Automation

Invoice Processing

Automatically identifies and extracts key information from invoices

🚀 layoutlmv2-base-uncased_finetuned_docvqa

This model is a fine - tuned version of microsoft/layoutlmv2-base-uncased, designed to enhance performance on specific tasks.

🚀 Quick Start

This model is a fine-tuned version of microsoft/layoutlmv2-base-uncased on the None dataset. It achieves the following results on the evaluation set:

Loss: 3.6030

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
5.326	0.22	50	4.4949
4.292	0.44	100	3.9510
3.9419	0.66	150	3.9100
3.6895	0.88	200	3.5035
3.4052	1.11	250	3.4030
3.1405	1.33	300	3.2100
2.8966	1.55	350	2.9803
2.7874	1.77	400	2.7811
2.5385	1.99	450	2.4748
2.1532	2.21	500	2.5843
1.994	2.43	550	2.5459
1.8322	2.65	600	2.2316
1.7005	2.88	650	2.1888
1.4758	3.1	700	2.4578
1.3543	3.32	750	2.3368
1.1939	3.54	800	2.9737
1.294	3.76	850	2.4907
1.4519	3.98	900	1.9276
1.0517	4.2	950	2.9981
0.8171	4.42	1000	2.5618
1.0456	4.65	1050	2.3139
0.9222	4.87	1100	2.4243
0.758	5.09	1150	2.8167
0.7203	5.31	1200	2.9342
0.6748	5.53	1250	2.6396
0.6821	5.75	1300	2.5629
0.5898	5.97	1350	3.0276
0.3135	6.19	1400	3.2611
0.4407	6.42	1450	3.1793
0.5303	6.64	1500	3.0511
0.5294	6.86	1550	3.1106
0.3149	7.08	1600	3.2933
0.199	7.3	1650	3.4207
0.164	7.52	1700	3.4379
0.5258	7.74	1750	3.1339
0.336	7.96	1800	3.2394
0.3294	8.19	1850	3.0956
0.1587	8.41	1900	3.4282
0.2375	8.63	1950	3.3718
0.117	8.85	2000	3.5646
0.2873	9.07	2050	3.5213
0.2206	9.29	2100	3.5387
0.2503	9.51	2150	3.5683
0.0763	9.73	2200	3.6119
0.1344	9.96	2250	3.6030

Framework versions

Transformers 4.30.2
Pytorch 2.0.1+cu118
Datasets 2.13.1
Tokenizers 0.13.3

📄 License

This model is licensed under cc-by-nc-sa-4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご