L

Layout Xlm Base Finetuned With DocLayNet Base At Linelevel Ml384

Developed by pierreguillou
A line-level document understanding model fine-tuned on the DocLayNet dataset based on the LayoutXLM base model, supporting multilingual document layout analysis and token classification.
Downloads 103
Release Time : 3/2/2023

Model Overview

This model is specifically designed for document layout analysis and understanding, capable of identifying and classifying different elements in documents (such as text, headings, tables, etc.), suitable for processing various document types including financial reports, scientific papers, and legal documents.

Model Features

Multilingual Support
Supports document understanding in multiple languages including English, German, French, and Japanese.
Line-level Analysis
Fine-tuned at the line level with 384 token blocks (with 128 token overlap), providing detailed document element recognition.
High-performance Token Classification
Achieves an F1 score of 0.7336 and an accuracy of 0.9373 on the DocLayNet evaluation set.

Model Capabilities

Document Layout Analysis
Token Classification
Multilingual Text Understanding
Line-level Element Recognition

Use Cases

Financial Document Processing
Financial Report Analysis
Automatically identifies tables, headings, and body content in financial reports.
Improves the efficiency and accuracy of financial data extraction.
Academic Research
Scientific Paper Parsing
Extracts section headings, figures, and references from scientific papers.
Assists researchers in quickly obtaining structural information from papers.
Legal Document Processing
Contract Clause Identification
Automatically marks clauses, definitions, and signature areas in legal documents.
Speeds up the legal document review process.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase