LayoutLMv2-base-uncased_finetuned_docvqa Open-source Model - Accurately Answer Document Questions and Assist Document Understanding

Layoutlmv2 Base Uncased Finetuned Docvqa

Developed by rogdevil

This model is a document visual question answering (VQA) specialized model based on Microsoft's LayoutLMv2 architecture, fine-tuned for document understanding tasks

Text-to-Image

Transformers

#Document QA #Multimodal Understanding #Layout Awareness

Downloads 16

Release Time : 2/29/2024

Model Overview

Specifically designed for visual question answering tasks on document images, capable of understanding the correlation between document layout structures and textual content

Model Features

Multimodal Understanding Capability

Simultaneously processes document text content and visual layout information

Document Structure Awareness

Capable of understanding complex document structures such as tables and forms

Efficient Fine-Tuning

Task-specific fine-tuning based on pre-trained models

Model Capabilities

Document Image Understanding

Visual Question Answering

Text Localization

Layout Analysis

Use Cases

Document Processing

Form Information Extraction

Automatically extracts key information from scanned form documents

Invoice Processing

Identifies key fields such as amounts and dates in invoices

Education

Automatic Test Grading

Recognizes handwritten or printed answers on student test papers

🚀 layoutlmv2-base-uncased_finetuned_docvqa

This model is a fine - tuned version of microsoft/layoutlmv2-base-uncased on an unknown dataset. It helps to achieve better performance on specific tasks. On the evaluation set, it achieves a loss of 4.6788.

🚀 Quick Start

This model is ready to be used right away after fine - tuning. You can load it using relevant libraries and start applying it to your tasks.

📚 Documentation

Model description

This model is a fine - tuned version of microsoft/layoutlmv2-base-uncased. However, more detailed information about its specific improvements and characteristics is yet to be provided.

Intended uses & limitations

More information about the intended uses and limitations of this model is needed.

Training and evaluation data

Details about the training and evaluation data are not provided yet.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Property	Details
learning_rate	5e - 05
train_batch_size	4
eval_batch_size	8
seed	42
optimizer	Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type	linear
num_epochs	20

Training results

Training Loss	Epoch	Step	Validation Loss
5.3193	0.22	50	4.5453
4.5115	0.44	100	4.1632
4.1316	0.66	150	3.8496
3.7911	0.88	200	3.7418
3.5175	1.11	250	3.9454
3.2171	1.33	300	3.0430
3.0377	1.55	350	3.1317
3.1081	1.77	400	2.8709
2.6219	1.99	450	2.9745
2.2922	2.21	500	3.0184
2.2245	2.43	550	2.6649
2.0918	2.65	600	2.3156
2.0339	2.88	650	2.4970
1.7088	3.1	700	2.2817
1.4584	3.32	750	2.3237
1.4296	3.54	800	2.1868
1.4413	3.76	850	2.2775
1.4055	3.98	900	2.6660
1.0251	4.2	950	2.6155
1.1251	4.42	1000	2.9841
1.059	4.65	1050	2.7376
1.0179	4.87	1100	3.7345
1.1128	5.09	1150	2.6704
0.8461	5.31	1200	3.0422
0.86	5.53	1250	3.2093
0.9124	5.75	1300	3.2782
0.8687	5.97	1350	3.1477
0.7039	6.19	1400	2.6896
0.8908	6.42	1450	3.0843
0.7408	6.64	1500	2.9585
0.6026	6.86	1550	3.3629
0.4852	7.08	1600	3.1505
0.5496	7.3	1650	3.6285
0.5578	7.52	1700	3.3481
0.5897	7.74	1750	3.3201
0.4487	7.96	1800	3.1462
0.2182	8.19	1850	3.7251
0.3524	8.41	1900	3.5870
0.4516	8.63	1950	3.6300
0.5658	8.85	2000	3.1085
0.4877	9.07	2050	3.5353
0.2226	9.29	2100	3.6744
0.2544	9.51	2150	4.1244
0.6194	9.73	2200	3.4775
0.3759	9.96	2250	3.7031
0.2718	10.18	2300	3.6076
0.1322	10.4	2350	3.6885
0.2596	10.62	2400	3.9328
0.1675	10.84	2450	4.1439
0.158	11.06	2500	4.4306
0.1462	11.28	2550	4.3744
0.2187	11.5	2600	4.4111
0.264	11.73	2650	3.9780
0.1997	11.95	2700	4.2383
0.1369	12.17	2750	4.1329
0.1204	12.39	2800	4.2738
0.2001	12.61	2850	4.0106
0.2132	12.83	2900	4.1816
0.1472	13.05	2950	4.4600
0.0603	13.27	3000	4.0050
0.0911	13.5	3050	4.1838
0.1016	13.72	3100	4.4429
0.0887	13.94	3150	4.1510
0.0495	14.16	3200	4.2938
0.0677	14.38	3250	4.6133
0.1263	14.6	3300	4.4634
0.1953	14.82	3350	3.9348
0.0212	15.04	3400	4.1726
0.0082	15.27	3450	4.3512
0.0432	15.49	3500	4.2992
0.0975	15.71	3550	4.2274
0.0933	15.93	3600	4.4028
0.024	16.15	3650	4.4662
0.0964	16.37	3700	4.3964
0.0487	16.59	3750	4.4827
0.0147	16.81	3800	4.5577
0.0951	17.04	3850	4.5640
0.0508	17.26	3900	4.4473
0.1163	17.48	3950	4.4565
0.0151	17.7	4000	4.5511
0.0569	17.92	4050	4.5298
0.0639	18.14	4100	4.5417
0.0155	18.36	4150	4.6398
0.0107	18.58	4200	4.7664
0.0044	18.81	4250	4.8119
0.0906	19.03	4300	4.7168
0.0533	19.25	4350	4.7032
0.0496	19.47	4400	4.6918
0.0938	19.69	4450	4.6824
0.0483	19.91	4500	4.6788

Framework versions

Property	Details
Transformers	4.38.1
Pytorch	2.2.1
Datasets	2.17.1
Tokenizers	0.15.2

📄 License

This model is released under the CC - BY - NC - SA 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご