The open-source model layoutlmv2-base-uncased_finetuned_docvqa - Powering document visual question answering and understanding tasks

Layoutlmv2 Base Uncased Finetuned Docvqa

Developed by hugginglaoda

A document visual question answering model based on the LayoutLMv2 architecture, specifically fine-tuned for document understanding tasks

Image-to-Text

Transformers

#Document Visual Question Answering #Multimodal Understanding #Layout Awareness

Downloads 16

Release Time : 4/1/2023

Model Overview

This model is a fine-tuned version of LayoutLMv2 base model for Document Visual Question Answering (DocVQA) tasks, capable of understanding document layouts and content to answer document-related questions

Model Features

Multimodal Understanding Capability

Combines textual content and visual layout information for document understanding

Document Structure Awareness

Capable of recognizing and utilizing structural information such as tables and paragraphs in documents

End-to-End Question Answering

Directly extracts information from document images to answer questions without intermediate OCR steps

Model Capabilities

Document Visual Question Answering

Document Understanding

Layout Analysis

Text Localization

Use Cases

Document Processing

Form Information Extraction

Extract specific field information from scanned forms

Contract Analysis

Answer specific questions about contract terms

Education

Automatic Test Grading

Answer grading-related questions based on scanned test papers

🚀 layoutlmv2-base-uncased_finetuned_docvqa

This model is a fine - tuned version of microsoft/layoutlmv2-base-uncased on the None dataset. It can achieve certain performance on the evaluation set, providing a solution for related tasks.

🚀 Quick Start

This model is a fine - tuned version of microsoft/layoutlmv2-base-uncased on the None dataset. It achieves the following results on the evaluation set:

Loss: 4.8430

📚 Documentation

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e - 05
train_batch_size: 4
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss
5.3379	0.22	50	4.6257
4.4305	0.44	100	4.2230
4.0588	0.66	150	3.9539
3.7822	0.88	200	3.7040
3.4957	1.11	250	3.4754
3.2417	1.33	300	3.1954
2.8607	1.55	350	2.8809
2.6602	1.77	400	2.9741
2.621	1.99	450	2.8658
2.1733	2.21	500	2.7248
2.106	2.43	550	2.4072
1.8389	2.65	600	2.4147
1.7862	2.88	650	2.2116
1.4224	3.1	700	2.4379
1.4773	3.32	750	2.4346
1.2225	3.54	800	2.5779
1.5368	3.76	850	2.4343
1.479	3.98	900	2.1432
0.7982	4.2	950	2.5897
0.8336	4.42	1000	2.8477
1.0647	4.65	1050	2.7111
0.8795	4.87	1100	2.5601
0.9265	5.09	1150	2.9547
0.7111	5.31	1200	3.1621
0.7244	5.53	1250	2.7862
0.9501	5.75	1300	2.4007
0.7424	5.97	1350	2.9918
0.4422	6.19	1400	3.5247
0.5952	6.42	1450	2.8743
0.7173	6.64	1500	2.7440
0.6311	6.86	1550	2.9658
0.393	7.08	1600	3.0994
0.3655	7.3	1650	3.3074
0.3432	7.52	1700	3.1921
0.5986	7.74	1750	3.3517
0.5456	7.96	1800	3.1552
0.565	8.19	1850	2.9922
0.3902	8.41	1900	3.6814
0.3408	8.63	1950	3.2820
0.241	8.85	2000	3.5644
0.3172	9.07	2050	3.4752
0.294	9.29	2100	3.7023
0.2993	9.51	2150	3.5031
0.0928	9.73	2200	4.0305
0.4598	9.96	2250	3.4260
0.2795	10.18	2300	3.2730
0.0887	10.4	2350	3.7174
0.3682	10.62	2400	3.4060
0.1924	10.84	2450	4.1368
0.1825	11.06	2500	4.1640
0.1987	11.28	2550	3.9908
0.0875	11.5	2600	4.1872
0.1719	11.73	2650	3.9948
0.2844	11.95	2700	4.1731
0.1085	12.17	2750	3.9568
0.1496	12.39	2800	3.9272
0.0701	12.61	2850	4.2957
0.1617	12.83	2900	4.2806
0.0934	13.05	2950	4.3200
0.0405	13.27	3000	4.1869
0.0898	13.5	3050	4.1207
0.189	13.72	3100	4.4437
0.0798	13.94	3150	4.6480
0.1199	14.16	3200	4.4105
0.0922	14.38	3250	4.4321
0.1556	14.6	3300	4.3353
0.1933	14.82	3350	4.0635
0.0164	15.04	3400	4.1792
0.064	15.27	3450	4.2202
0.0914	15.49	3500	4.2382
0.0287	15.71	3550	4.4255
0.1054	15.93	3600	4.5788
0.0306	16.15	3650	4.7566
0.0297	16.37	3700	4.6610
0.0529	16.59	3750	4.6494
0.0729	16.81	3800	4.6314
0.0388	17.04	3850	4.6675
0.0207	17.26	3900	4.7816
0.0889	17.48	3950	4.6941
0.0058	17.7	4000	4.6818
0.0068	17.92	4050	4.7755
0.0222	18.14	4100	4.7658
0.1152	18.36	4150	4.8247
0.0181	18.58	4200	4.8290
0.0349	18.81	4250	4.7989
0.0165	19.03	4300	4.8208
0.029	19.25	4350	4.8401
0.0073	19.47	4400	4.8544
0.0277	19.69	4450	4.8356
0.0164	19.91	4500	4.8430

Framework versions

Transformers 4.27.4
Pytorch 2.0.0+cu117
Datasets 2.11.0
Tokenizers 0.13.2

📄 License

This model is licensed under cc - by - nc - sa - 4.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご