开源SLANeXt_wireless模型 - 免费将表格图像转换为可编辑HTML格式

首页

Slanext Wireless

由 PaddlePaddle 开发

SLANeXt_wireless是一个用于表格结构识别的模型，能够将不可编辑的表格图像转换为可编辑的HTML格式。

文字识别支持多种语言开源协议:Apache-2.0 #表格结构识别 #HTML转换 #文档处理

下载量 244

发布时间 : 6/6/2025

模型简介

SLANeXt_wireless是表格识别系统的重要组成部分，专注于识别表格中行列和单元格的位置，输出表格区域的HTML代码，为后续表格识别流程提供输入。

模型特点

高精度表格结构识别

能够准确识别表格中的行列和单元格位置，准确率达69.65%。

多种使用方式

支持单模型使用和多模型组成的管道使用，适应不同场景需求。

输出可编辑格式

将表格图像转换为可编辑的HTML格式，便于后续处理和使用。

模型能力

表格结构识别

图像到HTML转换

表格行列检测

表格单元格定位

使用案例

文档处理

财务报表识别

将扫描的财务报表图像转换为可编辑的HTML格式

准确识别表格结构，保留原始布局

数据表格提取

从文档中提取数据表格并转换为结构化格式

便于数据分析和处理

办公自动化

PDF表格转换

将PDF中的表格转换为可编辑格式

提高文档处理效率

🚀 SLANeXt_wireless

表格结构识别是表格识别系统的重要组成部分，它能够将不可编辑的表格图像转换为可编辑的表格格式（如HTML）。表格结构识别的目标是识别表格中行列和单元格的位置，该模块的性能直接影响整个表格识别系统的准确性和效率。表格结构识别模块通常会输出表格区域的HTML代码，然后将其作为输入传递给表格识别流程进行进一步处理。

模型	准确率 (%)	GPU推理时间 (ms) [正常模式 / 高性能模式]	CPU推理时间 (ms) [正常模式 / 高性能模式]	模型存储大小 (M)
SLANeXt_wireless	69.65	--	--	351M

注意：SLANeXt_wireless的准确率来自与SLANeXt_wired的联合测试结果。

🚀 快速开始

✨ 主要特性

能够将不可编辑的表格图像转换为可编辑的HTML格式。
提供了多种使用方式，包括单模型使用和多模型组成的管道使用。

📦 安装指南

1. 安装PaddlePaddle

请参考以下命令，使用pip安装PaddlePaddle：

# 适用于CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# 适用于CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# 适用于CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddle安装的详细信息，请参考PaddlePaddle官方网站。

2. 安装PaddleOCR

从PyPI安装最新版本的PaddleOCR推理包：

python -m pip install paddleocr

💻 使用示例

基础用法

你可以使用单个命令快速体验其功能：

paddleocr table_structure_recognition \
    --model_name SLANeXt_wireless \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/6rfhb-CXOHowonjpBsaUJ.png

你也可以将表格分类模块的模型推理集成到你的项目中。在运行以下代码之前，请将示例图像下载到本地。

from paddleocr import TableStructureRecognition
model = TableStructureRecognition(model_name="SLANeXt_wireless")
output = model.predict(input="6rfhb-CXOHowonjpBsaUJ.png", batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_json("./output/res.json")

运行后，得到的结果如下：

{'res': {'input_path': '6rfhb-CXOHowonjpBsaUJ.png', 'page_index': None, 'bbox': [[5, 4, 48, 5, 46, 85, 5, 81], [84, 6, 146, 6, 143, 101, 83, 98], [186, 6, 217, 6, 212, 104, 184, 98], [239, 7, 281, 8, 276, 107, 235, 108], [324, 6, 405, 6, 404, 105, 323, 106], [405, 4, 488, 5, 488, 100, 403, 94], [3, 56, 96, 60, 95, 187, 3, 180], [108, 68, 157, 71, 159, 193, 110, 187], [179, 75, 207, 79, 211, 199, 184, 192], [238, 72, 277, 76, 281, 203, 243, 199], [318, 68, 400, 70, 404, 207, 325, 205], [395, 66, 494, 68, 494, 214, 397, 212], [11, 138, 62, 145, 68, 329, 12, 321], [105, 151, 156, 158, 171, 332, 117, 323], [177, 157, 210, 166, 229, 322, 197, 312], [232, 152, 276, 159, 295, 322, 253, 316], [313, 142, 396, 147, 409, 330, 332, 326], [392, 139, 491, 144, 492, 332, 404, 330], [3, 239, 86, 254, 103, 450, 3, 445], [97, 251, 152, 261, 176, 458, 116, 454], [172, 254, 211, 265, 239, 461, 200, 458], [235, 248, 289, 257, 316, 466, 264, 464], [310, 235, 402, 242, 419, 469, 337, 468], [381, 229, 491, 236, 492, 469, 400, 468], [9, 340, 74, 361, 88, 490, 11, 489], [95, 338, 129, 353, 150, 493, 113, 492], [176, 342, 192, 358, 221, 493, 206, 492], [235, 335, 261, 351, 289, 493, 265, 492], [310, 325, 372, 339, 393, 493, 338, 493], [382, 321, 482, 334, 485, 493, 402, 493]], 'structure': ['<html>', '<body>', '<table>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</table>', '</body>', '</html>'], 'structure_score': 0.9999998}}

高级用法

通用表格识别V2管道

单个模型的能力是有限的，但由多个模型组成的管道可以提供更强的能力来解决现实场景中的难题。通用表格识别V2管道用于通过从图像中提取信息并以HTML或Excel格式输出，来解决表格识别任务。该管道包含8个模块：

表格分类模块
表格结构识别模块
表格单元格检测模块
文本检测模块
文本识别模块
布局区域检测模块（可选）
文档图像方向分类模块（可选）
文本图像去畸变模块（可选）

使用单个命令快速体验通用表格识别V2管道：

paddleocr table_recognition_v2 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png  \
    --use_doc_orientation_classify False  \
    --use_doc_unwarping False \
    --save_path ./output \
    --device gpu:0

结果将打印到终端：

{'res': {'input_path': 'mabagznApI1k9R8qFoTLc.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 8, 'label': 'table', 'score': 0.86655592918396, 'coordinate': [0.0125130415, 0.41920784, 1281.3737, 585.3884]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 0.13052222, ..., 73.08310249]), array([104.43082511, ...,  73.27777413]), array([319.39041221, ...,  73.30439308]), array([424.2436837 , ...,  73.44736794]), array([580.75836265, ...,  73.24003914]), array([723.04370201, ...,  73.22717598]), array([984.67315757, ...,  73.20420387]), array([1.25130415e-02, ..., 5.85419208e+02]), array([984.37072837, ..., 137.02281502]), array([984.26586998, ..., 201.22290352]), array([984.24017417, ..., 585.30775765]), array([1039.90606773, ...,  265.44664314]), array([1039.69549644, ...,  329.30540779]), array([1039.66546714, ...,  393.57319954]), array([1039.5122689 , ...,  457.74644783]), array([1039.55535972, ...,  521.73030403]), array([1039.58612144, ...,  585.09468392])], 'pred_html': '<html><body><table><tbody><tr><td>部门</td><td></td><td>报销人</td><td></td><td>报销事由</td><td></td><td colspan="2">批准人：</td></tr><tr><td colspan="6" rowspan="8"></td><td colspan="2">单据 张</td></tr><tr><td colspan="2">合计金额 元</td></tr><tr><td rowspan="6">其 中</td><td>车费票</td></tr><tr><td>火车费票</td></tr><tr><td>飞机票</td></tr><tr><td>旅住宿费</td></tr><tr><td>其他</td></tr><tr><td>补贴</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}}]}}

如果指定了save_path，可视化结果将保存在save_path下。可视化输出如下： image/jpeg

命令行方法适用于快速体验。对于项目集成，也只需要几行代码：

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(
    use_doc_orientation_classify=False, # 使用use_doc_orientation_classify启用/禁用文档方向分类模型
    use_doc_unwarping=False, # 使用use_doc_unwarping启用/禁用文档去畸变模块
)
# pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=True) # 使用use_doc_orientation_classify指定是否使用文档方向分类模型
# pipeline = TableRecognitionPipelineV2(use_doc_unwarping=True) # 使用use_doc_unwarping指定是否使用文本图像去畸变模块
# pipeline = TableRecognitionPipelineV2(device="gpu") # 使用device指定使用GPU进行模型推理
output = pipeline.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png")
for res in output:
    res.print() ## 打印预测的结构化输出
    res.save_to_img("./output/")
    res.save_to_xlsx("./output/")
    res.save_to_html("./output/")
    res.save_to_json("./output/")

使用命令和参数说明的详细信息，请参考文档。

PP-StructureV3

布局分析是一种从文档图像中提取结构化信息的技术。PP-StructureV3包括以下六个模块：

布局检测模块
通用OCR管道
文档图像预处理管道（可选）
表格识别管道（可选）
印章识别管道（可选）
公式识别管道（可选）

运行单个命令快速体验PP-StructureV3管道：

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --device gpu:0

结果将打印到终端。如果指定了save_path，结果将保存在save_path下。

只需几行代码就可以体验管道的推理。以PP-StructureV3管道为例：

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    use_doc_orientation_classify=False, # 使用use_doc_orientation_classify启用/禁用文档方向分类模型
    use_doc_unwarping=False,    # 使用use_doc_unwarping启用/禁用文档去畸变模块
    use_textline_orientation=False, # 使用use_textline_orientation启用/禁用文本行方向分类模型
    device="gpu:0", # 使用device指定使用GPU进行模型推理
    )
output = pipeline.predict(".mG4tnwfrvECoFMu-S9mxo.png")
for res in output:
    res.print() # 打印结构化预测输出
    res.save_to_json(save_path="output") ## 以JSON格式保存当前图像的结构化结果
    res.save_to_markdown(save_path="output") ## 以Markdown格式保存当前图像的结果