Layout-XLM Open-source Document Understanding Model - Free Support for Multi-language Document Layout Analysis and Tag Classification

Layout Xlm Base Finetuned With DocLayNet Base At Linelevel Ml384

Developed by pierreguillou

A line-level document understanding model fine-tuned on the DocLayNet dataset based on the LayoutXLM base model, supporting multilingual document layout analysis and token classification.

Text Recognition

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Multilingual Document Understanding #Line-level Layout Analysis #Financial Document Processing

Downloads 103

Release Time : 3/2/2023

Model Overview

This model is specifically designed for document layout analysis and understanding, capable of identifying and classifying different elements in documents (such as text, headings, tables, etc.), suitable for processing various document types including financial reports, scientific papers, and legal documents.

Model Features

Multilingual Support

Supports document understanding in multiple languages including English, German, French, and Japanese.

Line-level Analysis

Fine-tuned at the line level with 384 token blocks (with 128 token overlap), providing detailed document element recognition.

High-performance Token Classification

Achieves an F1 score of 0.7336 and an accuracy of 0.9373 on the DocLayNet evaluation set.

Model Capabilities

Document Layout Analysis

Token Classification

Multilingual Text Understanding

Line-level Element Recognition

Use Cases

Financial Document Processing

Financial Report Analysis

Automatically identifies tables, headings, and body content in financial reports.

Improves the efficiency and accuracy of financial data extraction.

Academic Research

Scientific Paper Parsing

Extracts section headings, figures, and references from scientific papers.

Assists researchers in quickly obtaining structural information from papers.

Legal Document Processing

Contract Clause Identification

Automatically marks clauses, definitions, and signature areas in legal documents.

Speeds up the legal document review process.

🚀 Document Understanding model (finetuned LayoutXLM base at line level on DocLayNet base)

This model is a fine - tuned version of [microsoft/layoutxlm - base](https://huggingface.co/microsoft/layoutxlm - base) using the [DocLayNet base](https://huggingface.co/datasets/pierreguillou/DocLayNet - base) dataset. It excels in document understanding tasks, offering high - accuracy results for token classification and other related tasks.

✨ Features

Multilingual Support: Supports multiple languages including English, German, French, and Japanese.
High - Performance Metrics: Achieves excellent results in precision, recall, F1 score, and accuracy on the evaluation set.
Fine - Tuned for Line - Level Analysis: Finetuned at the line level on chunks of 384 tokens with an overlap of 128 tokens, enabling detailed document analysis.

📚 Documentation

References

Blog posts

Layout XLM base
- (03/05/2023) Document AI | Inference APP and fine - tuning notebook for Document Understanding at line level with LayoutXLM base
LiLT base
- (02/16/2023) [Document AI | Inference APP and fine - tuning notebook for Document Understanding at paragraph level](https://medium.com/@pierre_guillou/document - ai - inference - app - and - fine - tuning - notebook - for - document - understanding - at - paragraph - level - c18d16e53cf8)
- (02/14/2023) [Document AI | Inference APP for Document Understanding at line level](https://medium.com/@pierre_guillou/document - ai - inference - app - for - document - understanding - at - line - level - a35bbfa98893)
- (02/10/2023) [Document AI | Document Understanding model at line level with LiLT, Tesseract and DocLayNet dataset](https://medium.com/@pierre_guillou/document - ai - document - understanding - model - at - line - level - with - lilt - tesseract - and - doclaynet - dataset - 347107a643b8)
- (01/31/2023) [Document AI | DocLayNet image viewer APP](https://medium.com/@pierre_guillou/document - ai - doclaynet - image - viewer - app - 3ac54c19956)
- (01/27/2023) [Document AI | Processing of DocLayNet dataset to be used by layout models of the Hugging Face hub (finetuning, inference)](https://medium.com/@pierre_guillou/document - ai - processing - of - doclaynet - dataset - to - be - used - by - layout - models - of - the - hugging - face - hub - 308d8bd81cdb)

Notebooks (paragraph level)

LiLT base
- [Document AI | Inference APP at paragraph level with a Document Understanding model (LiLT fine - tuned on DocLayNet dataset)](https://github.com/piegu/language - models/blob/master/Gradio_inference_on_LiLT_model_finetuned_on_DocLayNet_base_in_any_language_at_levelparagraphs_ml512.ipynb)
- [Document AI | Inference at paragraph level with a Document Understanding model (LiLT fine - tuned on DocLayNet dataset)](https://github.com/piegu/language - models/blob/master/inference_on_LiLT_model_finetuned_on_DocLayNet_base_in_any_language_at_levelparagraphs_ml512.ipynb)
- [Document AI | Fine - tune LiLT on DocLayNet base in any language at paragraph level (chunk of 512 tokens with overlap)](https://github.com/piegu/language - models/blob/master/Fine_tune_LiLT_on_DocLayNet_base_in_any_language_at_paragraphlevel_ml_512.ipynb)

Notebooks (line level)

Layout XLM base
- [Document AI | Inference at line level with a Document Understanding model (LayoutXLM base fine - tuned on DocLayNet dataset)](https://github.com/piegu/language - models/blob/master/inference_on_LayoutXLM_base_model_finetuned_on_DocLayNet_base_in_any_language_at_levellines_ml384.ipynb)
- [Document AI | Inference APP at line level with a Document Understanding model (LayoutXLM base fine - tuned on DocLayNet base dataset)](https://github.com/piegu/language - models/blob/master/Gradio_inference_on_LayoutXLM_base_model_finetuned_on_DocLayNet_base_in_any_language_at_levellines_ml384.ipynb)
- [Document AI | Fine - tune LayoutXLM base on DocLayNet base in any language at line level (chunk of 384 tokens with overlap)](https://github.com/piegu/language - models/blob/master/Fine_tune_LayoutXLM_base_on_DocLayNet_base_in_any_language_at_linelevel_ml_384.ipynb)
LiLT base
- [Document AI | Inference at line level with a Document Understanding model (LiLT fine - tuned on DocLayNet dataset)](https://github.com/piegu/language - models/blob/master/inference_on_LiLT_model_finetuned_on_DocLayNet_base_in_any_language_at_levellines_ml384.ipynb)
- [Document AI | Inference APP at line level with a Document Understanding model (LiLT fine - tuned on DocLayNet dataset)](https://github.com/piegu/language - models/blob/master/Gradio_inference_on_LiLT_model_finetuned_on_DocLayNet_base_in_any_language_at_levellines_ml384.ipynb)
- [Document AI | Fine - tune LiLT on DocLayNet base in any language at line level (chunk of 384 tokens with overlap)](https://github.com/piegu/language - models/blob/master/Fine_tune_LiLT_on_DocLayNet_base_in_any_language_at_linelevel_ml_384.ipynb)
- [DocLayNet image viewer APP](https://github.com/piegu/language - models/blob/master/DocLayNet_image_viewer_APP.ipynb)
- Processing of DocLayNet dataset to be used by layout models of the Hugging Face hub (finetuning, inference)

APP

You can test this model with this APP in Hugging Face Spaces: [Inference APP for Document Understanding at line level (v2)](https://huggingface.co/spaces/pierreguillou/Inference - APP - Document - Understanding - at - linelevel - v2).

![Inference APP for Document Understanding at line level (v2)](https://huggingface.co/pierreguillou/layout - xlm - base - finetuned - with - DocLayNet - base - at - linelevel - ml384/resolve/main/app_layoutXLM_base_document_understanding_AI.png)

DocLayNet dataset

DocLayNet dataset (IBM) provides page - by - page layout segmentation ground - truth using bounding - boxes for 11 distinct class labels on 80863 unique pages from 6 document categories.

Until today, the dataset can be downloaded through direct links or as a dataset from Hugging Face datasets:

direct links: [doclaynet_core.zip](https://codait - cos - dax.s3.us.cloud - object - storage.appdomain.cloud/dax - doclaynet/1.0.0/DocLayNet_core.zip) (28 GiB), [doclaynet_extra.zip](https://codait - cos - dax.s3.us.cloud - object - storage.appdomain.cloud/dax - doclaynet/1.0.0/DocLayNet_extra.zip) (7.5 GiB)
Hugging Face dataset library: dataset DocLayNet

Paper: DocLayNet: A Large Human - Annotated Dataset for Document - Layout Analysis (06/02/2022)

Model description

The model was finetuned at line level on chunk of 384 tokens with overlap of 128 tokens. Thus, the model was trained with all layout and text data of all pages of the dataset.

At inference time, a calculation of best probabilities give the label to each line bounding boxes.

Inference

See notebook: [Document AI | Inference at line level with a Document Understanding model (LayoutXLM base fine - tuned on DocLayNet dataset)](https://github.com/piegu/language - models/blob/master/inference_on_LayoutXLM_base_model_finetuned_on_DocLayNet_base_in_any_language_at_levellines_ml384.ipynb)

Training and evaluation data

See notebook: [Document AI | Fine - tune LayoutXLM base on DocLayNet base in any language at line level (chunk of 384 tokens with overlap)](https://github.com/piegu/language - models/blob/master/Fine_tune_LayoutXLM_base_on_DocLayNet_base_in_any_language_at_linelevel_ml_384.ipynb)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e - 05
train_batch_size: 8
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Accuracy	F1	Validation Loss	Precision	Recall
No log	0.12	300	0.8413	0.1311	0.5185	0.1437	0.1205
0.9231	0.25	600	0.8751	0.5031	0.4108	0.4637	0.5498
0.9231	0.37	900	0.8887	0.5206	0.3911	0.5076	0.5343
0.369	0.5	1200	0.8724	0.5365	0.4118	0.5094	0.5667
0.2737	0.62	1500	0.8960	0.6033	0.3328	0.6046	0.6020
0.2737	0.75	1800	0.9186	0.6404	0.2984	0.6062	0.6787
0.2542	0.87	2100	0.9163	0.6593	0.3115	0.6324	0.6887
0.2542	1.0	2400	0.9198	0.6537	0.2878	0.6160	0.6962
0.1938	1.12	2700	0.9165	0.6752	0.3414	0.6673	0.6833
0.1581	1.25	3000	0.9193	0.6871	0.3611	0.6868	0.6875
0.1581	1.37	3300	0.9256	0.6822	0.2763	0.6988	0.6663
0.1428	1.5	3600	0.9287	0.7084	0.3065	0.7246	0.6929
0.1428	1.62	3900	0.9194	0.6812	0.2942	0.6866	0.6760
0.1025	1.74	4200	0.9347	0.7223	0.2990	0.7315	0.7133
0.1225	1.87	4500	0.9360	0.7048	0.2729	0.7249	0.6858
0.1225	1.99	4800	0.9396	0.7222	0.2826	0.7497	0.6966
0.108	2.12	5100	0.9301	0.7193	0.3071	0.7022	0.7372
0.108	2.24	5400	0.9334	0.7243	0.2999	0.7250	0.7237
0.0799	2.37	5700	0.9382	0.7254	0.2710	0.7310	0.7198
0.0793	2.49	6000	0.9329	0.7228	0.3201	0.7352	0.7108
0.0793	2.62	6300	0.9373	0.7336	0.3035	0.7260	0.7415
0.0696	2.74	6600	0.9374	0.7275	0.3137	0.7313	0.7237
0.0696	2.87	6900	0.9381	0.7253	0.3242	0.7369	0.7142
0.0866	2.99	7200	0.2473	0.7439	0.7207	0.7321	0.9407

Framework versions

Transformers 4.26.1
Pytorch 1.10.0+cu111
Datasets 2.10.1
Tokenizers 0.13.2

Other models

Line level
- [Document Understanding model (finetuned LiLT base at line level on DocLayNet base)](https://huggingface.co/pierreguillou/lilt - xlm - roberta - base - finetuned - with - DocLayNet - base - at - linelevel - ml384) (accuracy | tokens: 85.84% - lines: 91.97%)
- [Document Understanding model (finetuned LayoutXLM base at line level on DocLayNet base)](https://huggingface.co/pierreguillou/layout - xlm - base - finetuned - with - DocLayNet - base - at - linelevel - ml384) (accuracy | tokens: 93.73% - lines: ...)
Paragraph level
- [Document Understanding model (finetuned LiLT base at paragraph level on DocLayNet base)](https://huggingface.co/pierreguillou/lilt - xlm - roberta - base - finetuned - with - DocLayNet - base - at - paragraphlevel - ml512) (accuracy | tokens: 86.34% - paragraphs: 68.15%)
- [Document Understanding model (finetuned LayoutXLM base at paragraph level on DocLayNet base)](https://huggingface.co/pierreguillou/layout - xlm - base - finetuned - with - DocLayNet - base - at - paragraphlevel - ml512) (accuracy | tokens: 96.93% - paragraphs: 86.55%)

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご