L

Lilt Xlm Roberta Base Finetuned With DocLayNet Base At Linelevel Ml384

Developed by pierreguillou
A line-level document understanding model fine-tuned based on LiLT and DocLayNet dataset, supporting multilingual document layout analysis
Downloads 700
Release Time : 2/9/2023

Model Overview

This model is a document understanding model fine-tuned on the DocLayNet dataset based on the LiLT architecture, specifically designed for line-level document layout analysis and token classification. It can identify 11 different element types in documents, such as headings, text, tables, and images.

Model Features

Multilingual Support
Supports document analysis in multiple languages including English, German, French, and Japanese
Line-level Analysis
Accurately identifies element types for each line in documents with an accuracy of 91.97%
Wide Document Type Support
Applicable to various document types such as financial reports, manuals, scientific articles, legal documents, patents, and government tenders
High-precision Element Recognition
Achieves extremely high recognition accuracy for specific elements like tables (97.65%) and formulas (98.02%)

Model Capabilities

Document Layout Analysis
Line-level Element Classification
Multilingual Document Processing
PDF Document Understanding
Vision-Language Joint Modeling

Use Cases

Document Processing Automation
Financial Report Analysis
Automatically identifies tables, headings, and body content in financial reports
Table recognition accuracy reaches 97.65%
Legal Document Processing
Extracts chapter headings, body text, and footnotes from legal documents
Chapter heading recognition accuracy is 76.92%
Knowledge Management
Scientific Literature Indexing
Automatically classifies formulas, images, and body text in scientific articles
Formula recognition accuracy reaches 98.02%
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase