PP-DocLayout_plus-L开源文档布局模型 - 高精度定位20种常见文档元素

首页

PP DocLayout Plus L

由 PaddlePaddle 开发

PP-DocLayout_plus-L 是一个高精度的文档布局区域定位模型，基于 RT-DETR-L 架构训练，支持 20 种常见文档元素的检测。

文字识别支持多种语言开源协议:Apache-2.0 #文档布局检测 #多类别定位 #高精度OCR预处理

下载量 1,308

发布时间 : 6/6/2025

模型简介

该模型专门用于文档图像的布局分析，能够精确定位文档中的标题、段落、表格、公式等多种元素，适用于中英文混合文档处理。

模型特点

多类别检测

支持检测20种文档元素，包括文本、标题、表格、公式等

高精度

在自建数据集上达到83.2%的mAP(0.5)指标

广泛适用性

训练数据涵盖论文、PPT、合同、古籍等多种文档类型

模型能力

文档布局分析

表格检测

公式检测

标题识别

文本区域定位

使用案例

文档处理

学术论文分析

自动识别论文中的标题、摘要、参考文献等结构

可生成结构化论文内容

合同解析

定位合同中的关键条款和签名区域

辅助合同审查流程

教育

试卷分析

识别试卷中的题目、答案区域

辅助自动阅卷系统

🚀 PP-DocLayout_plus-L

PP-DocLayout_plus-L 是一个高精度的布局区域定位模型，它基于 RT-DETR-L 在自建数据集上训练得到。该数据集包含中英文论文、PPT、多布局杂志、合同、书籍、试卷、古籍和研究报告等。此布局检测模型涵盖 20 种常见类别，如文档标题、段落标题、文本、页码、摘要、表格、参考文献、脚注、页眉、页脚、算法、公式、公式编号、图像、表格、印章、图表标题、图表以及侧边栏文本和参考文献列表等。

关键指标

模型	mAP(0.5) (%)
PP-DocLayout_plus-L	83.2

注意：上述精度指标的评估集为自建版本子区域检测数据集，包含中英文论文、杂志、报纸、研究报告、PPT、试卷和教科书等 1000 张文档类型图片。

🚀 快速开始

📦 安装指南

1. 安装 PaddlePaddle

请参考以下命令，使用 pip 安装 PaddlePaddle：

# 适用于 CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# 适用于 CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# 适用于 CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddle 安装详情请参考 PaddlePaddle 官方网站。

2. 安装 PaddleOCR

从 PyPI 安装最新版本的 PaddleOCR 推理包：

python -m pip install paddleocr

💻 使用示例

基础用法

你可以使用单个命令快速体验其功能：

paddleocr layout_detection \
    --model_name PP-DocLayout_plus-L \
    -i https://cdn-uploads.huggingface.co/production/uploads/63d7b8ee07cd1aa3c49a2026/N5C68HPVAI-xQAWTxpbA6.jpeg

你也可以将布局检测模块的模型推理集成到你的项目中。在运行以下代码之前，请将示例图像下载到本地机器。

from paddleocr import LayoutDetection

model = LayoutDetection(model_name="PP-DocLayout_plus-L")
output = model.predict("N5C68HPVAI-xQAWTxpbA6.jpeg", batch_size=1, layout_nms=True)
for res in output:
    res.print()
    res.save_to_img(save_path="./output/")
    res.save_to_json(save_path="./output/res.json")

运行后，得到的结果如下：

{'res': {'input_path': '/root/.paddlex/predict_input/N5C68HPVAI-xQAWTxpbA6.jpeg', 'page_index': None, 'boxes': [{'cls_id': 2, 'label': 'text', 'score': 0.9870168566703796, 'coordinate': [34.101395, 349.85275, 358.5929, 611.0788]}, {'cls_id': 2, 'label': 'text', 'score': 0.986599326133728, 'coordinate': [34.500305, 647.15753, 358.29437, 848.66925]}, {'cls_id': 2, 'label': 'text', 'score': 0.984662652015686, 'coordinate': [385.71417, 497.41037, 711.22656, 697.8426]}, {'cls_id': 8, 'label': 'table', 'score': 0.9841272234916687, 'coordinate': [73.76732, 105.94854, 321.95355, 298.85074]}, {'cls_id': 8, 'label': 'table', 'score': 0.983431875705719, 'coordinate': [436.95523, 105.81446, 662.71814, 313.4865]}, {'cls_id': 2, 'label': 'text', 'score': 0.9832285642623901, 'coordinate': [385.62766, 346.22888, 710.10205, 458.772]}, {'cls_id': 2, 'label': 'text', 'score': 0.9816107749938965, 'coordinate': [385.78085, 735.19293, 710.5613, 849.97656]}, {'cls_id': 6, 'label': 'figure_title', 'score': 0.9577467441558838, 'coordinate': [34.421764, 20.055021, 358.7124, 76.53721]}, {'cls_id': 6, 'label': 'figure_title', 'score': 0.9505674839019775, 'coordinate': [385.7235, 20.054104, 711.2928, 74.92819]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.9001894593238831, 'coordinate': [386.46353, 477.035, 699.4023, 490.07495]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.8846081495285034, 'coordinate': [35.413055, 627.7365, 185.58315, 640.522]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.8837621808052063, 'coordinate': [387.1759, 716.34235, 524.78345, 729.2588]}, {'cls_id': 0, 'label': 'paragraph_title', 'score': 0.8509567975997925, 'coordinate': [35.50049, 331.18472, 141.64497, 344.81168]}]}}

可视化图像如下： image/jpeg

使用命令和参数说明详情请参考文档。

高级用法

单个模型的能力有限，但由多个模型组成的管道可以提供更强的能力来解决现实场景中的难题。

PP-StructureV3

布局分析是一种从文档图像中提取结构化信息的技术。PP-StructureV3 包括以下六个模块：

布局检测模块
通用 OCR 子管道
文档图像预处理子管道（可选）
表格识别子管道（可选）
印章识别子管道（可选）
公式识别子管道（可选）

你可以使用单个命令快速体验 PP-StructureV3 管道：

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/63d7b8ee07cd1aa3c49a2026/KP10tiSZfAjMuwZUSLtRp.png

你只需几行代码即可体验管道的推理。以 PP-StructureV3 管道为例：

from paddleocr import PPStructureV3

pipeline = PPStructureV3()
# ocr = PPStructureV3(use_doc_orientation_classify=True) # 使用 use_doc_orientation_classify 启用/禁用文档方向分类模型
# ocr = PPStructureV3(use_doc_unwarping=True) # 使用 use_doc_unwarping 启用/禁用文档去畸变模块
# ocr = PPStructureV3(use_textline_orientation=True) # 使用 use_textline_orientation 启用/禁用文本行方向分类模型
# ocr = PPStructureV3(device="gpu") # 使用 device 指定 GPU 进行模型推理
output = pipeline.predict("./KP10tiSZfAjMuwZUSLtRp.png")
for res in output:
    res.print() ## 打印结构化预测输出
    res.save_to_json(save_path="output") ## 以 JSON 格式保存当前图像的结构化结果
    res.save_to_markdown(save_path="output") ## 以 Markdown 格式保存当前图像的结果