SLANeXt_wired開源表格識別模型 - 免費將表格圖像轉為可編輯HTML格式

首頁

Slanext Wired

由PaddlePaddle開發

SLANeXt_wired 是一個用於表格結構識別的深度學習模型，能夠將不可編輯的表格圖像轉換為可編輯的表格格式（如 HTML）。

文字識別支持多種語言開源協議:Apache-2.0 #表格結構識別 #HTML轉換 #文檔處理

下載量 1,141

發布時間 : 6/6/2025

模型概述

該模型是表格識別系統的重要組成部分，專注於識別表格中行列和單元格的位置，輸出表格區域的 HTML 代碼，為後續表格識別流程提供輸入。

模型特點

高精度表格結構識別

能夠準確識別表格中的行列和單元格位置，輸出結構化 HTML 代碼。

集成到完整流程

可作為通用表格識別 V2 流程和 PP-StructureV3 流程的一部分，與其他模塊協同工作。

多格式輸出支持

支持將識別結果保存為 JSON、HTML、Excel 等多種格式。

模型能力

表格結構識別

HTML 代碼生成

表格圖像分析

使用案例

文檔處理

報銷單處理

識別報銷單中的表格結構，提取部門、報銷人、金額等信息。

輸出結構化 HTML 代碼，便於後續數據處理和分析。

財務報表分析

將紙質財務報表轉換為可編輯的數字格式。

準確識別表格結構，保留原始數據關係。

數據錄入自動化

紙質表格數字化

將掃描的紙質表格轉換為可編輯的電子表格。

減少人工錄入工作量，提高數據準確性。

🚀 SLANeXt_wired

表格結構識別是表格識別系統的重要組成部分，能夠將不可編輯的表格圖像轉換為可編輯的表格格式（如 HTML）。其目標是識別表格中行列和單元格的位置，該模塊的性能直接影響整個表格識別系統的準確性和效率。表格結構識別模塊通常輸出表格區域的 HTML 代碼，再作為輸入傳遞給表格識別流程進行後續處理。

屬性	詳情
模型類型	SLANeXt_wired
訓練數據	未提及

模型	準確率 (%)	GPU 推理時間 (ms) [正常模式 / 高性能模式]	CPU 推理時間 (ms) [正常模式 / 高性能模式]	模型存儲大小 (M)
SLANeXt_wired	69.65	--	--	351M

注意：SLANeXt_wired 的準確率來自與 SLANeXt_wireless 的聯合測試結果。

🚀 快速開始

安裝依賴

1. 安裝 PaddlePaddle

請參考以下命令，使用 pip 安裝 PaddlePaddle：

# 適用於 CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# 適用於 CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# 適用於 CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddle 安裝詳情請參考 PaddlePaddle 官方網站。

2. 安裝 PaddleOCR

從 PyPI 安裝最新版本的 PaddleOCR 推理包：

python -m pip install paddleocr

💻 使用示例

基礎用法

你可以通過一條命令快速體驗模型功能：

paddleocr table_structure_recognition \
    --model_name SLANeXt_wired \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/JUU_5wJWVo4PcmJhSdIo3.png

你也可以將表格分類模塊的模型推理集成到你的項目中。在運行以下代碼前，請將示例圖像下載到本地。

from paddleocr import TableStructureRecognition
model = TableStructureRecognition(model_name="SLANeXt_wired")
output = model.predict(input="JUU_5wJWVo4PcmJhSdIo3.png", batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_json("./output/res.json")

運行後，得到的結果如下：

{'res': {'input_path': 'JUU_5wJWVo4PcmJhSdIo3.png', 'page_index': None, 'bbox': [[12, 4, 96, 5, 87, 43, 11, 40], [188, 4, 276, 5, 261, 50, 174, 48], [352, 4, 477, 4, 477, 57, 341, 55], [9, 36, 133, 38, 126, 95, 8, 93], [211, 40, 282, 40, 269, 104, 198, 103], [330, 38, 476, 39, 476, 106, 320, 106], [49, 72, 107, 76, 105, 187, 47, 182], [215, 78, 284, 80, 280, 180, 212, 177], [334, 72, 476, 73, 476, 175, 333, 175], [6, 140, 145, 153, 149, 247, 6, 233], [197, 149, 282, 158, 285, 254, 201, 245], [302, 144, 476, 152, 476, 254, 305, 246], [32, 196, 100, 208, 107, 312, 34, 299], [193, 198, 270, 209, 282, 318, 206, 309], [322, 192, 475, 202, 475, 327, 333, 319], [5, 257, 122, 271, 137, 379, 6, 370], [171, 262, 260, 273, 277, 392, 188, 386], [296, 257, 476, 265, 476, 398, 313, 394], [17, 319, 107, 322, 120, 454, 20, 454], [155, 319, 266, 320, 284, 457, 173, 457], [288, 307, 475, 308, 475, 460, 307, 460], [12, 426, 101, 425, 103, 475, 12, 475], [154, 399, 279, 390, 285, 475, 160, 476], [289, 388, 475, 380, 475, 475, 297, 476]], 'structure': ['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>'], 'structure_score': 0.9998931}}

高級用法

通用表格識別 V2 流程

通用表格識別 V2 流程用於解決表格識別任務，通過從圖像中提取信息並以 HTML 或 Excel 格式輸出。該流程包含 8 個模塊：

表格分類模塊
表格結構識別模塊
表格單元格檢測模塊
文本檢測模塊
文本識別模塊
佈局區域檢測模塊（可選）
文檔圖像方向分類模塊（可選）
文本圖像去畸變模塊（可選）

你可以通過以下命令快速體驗通用表格識別 V2 流程：

paddleocr table_recognition_v2 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png  \
    --use_doc_orientation_classify False  \
    --use_doc_unwarping False \
    --save_path ./output \
    --device gpu:0

結果將打印到終端：

{'res': {'input_path': 'mabagznApI1k9R8qFoTLc.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 8, 'label': 'table', 'score': 0.86655592918396, 'coordinate': [0.0125130415, 0.41920784, 1281.3737, 585.3884]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['部門', '報銷人', '報銷事由', '批准人：', '單據', '張', '合計金額', '元', '車費票', '其', '火車費票', '飛機票', '中', '旅住宿費', '其他', '補貼'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 0.13052222, ..., 73.08310249]), array([104.43082511, ...,  73.27777413]), array([319.39041221, ...,  73.30439308]), array([424.2436837 , ...,  73.44736794]), array([580.75836265, ...,  73.24003914]), array([723.04370201, ...,  73.22717598]), array([984.67315757, ...,  73.20420387]), array([1.25130415e-02, ..., 5.85419208e+02]), array([984.37072837, ..., 137.02281502]), array([984.26586998, ..., 201.22290352]), array([984.24017417, ..., 585.30775765]), array([1039.90606773, ...,  265.44664314]), array([1039.69549644, ...,  329.30540779]), array([1039.66546714, ...,  393.57319954]), array([1039.5122689 , ...,  457.74644783]), array([1039.55535972, ...,  521.73030403]), array([1039.58612144, ...,  585.09468392])], 'pred_html': '<html><body><table><tbody><tr><td>部門</td><td></td><td>報銷人</td><td></td><td>報銷事由</td><td></td><td colspan="2">批准人：</td></tr><tr><td colspan="6" rowspan="8"></td><td colspan="2">單據 張</td></tr><tr><td colspan="2">合計金額 元</td></tr><tr><td rowspan="6">其 中</td><td>車費票</td></tr><tr><td>火車費票</td></tr><tr><td>飛機票</td></tr><tr><td>旅住宿費</td></tr><tr><td>其他</td></tr><tr><td>補貼</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_texts': ['部門', '報銷人', '報銷事由', '批准人：', '單據', '張', '合計金額', '元', '車費票', '其', '火車費票', '飛機票', '中', '旅住宿費', '其他', '補貼'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}}]}}

如果指定了 save_path，可視化結果將保存到 save_path 下。可視化輸出如下： image/jpeg

命令行方式適用於快速體驗，對於項目集成，也只需要幾行代碼：

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(
    use_doc_orientation_classify=False, # 使用 use_doc_orientation_classify 啟用/禁用文檔方向分類模型
    use_doc_unwarping=False, # 使用 use_doc_unwarping 啟用/禁用文檔去畸變模塊
)
# pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=True) # 使用 use_doc_orientation_classify 指定是否使用文檔方向分類模型
# pipeline = TableRecognitionPipelineV2(use_doc_unwarping=True) # 使用 use_doc_unwarping 指定是否使用文本圖像去畸變模塊
# pipeline = TableRecognitionPipelineV2(device="gpu") # 指定使用 GPU 進行模型推理的設備
output = pipeline.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png")
for res in output:
    res.print() ## 打印預測的結構化輸出
    res.save_to_img("./output/")
    res.save_to_xlsx("./output/")
    res.save_to_html("./output/")
    res.save_to_json("./output/")

PP-StructureV3

佈局分析是一種從文檔圖像中提取結構化信息的技術。PP-StructureV3 包含以下六個模塊：

佈局檢測模塊
通用 OCR 流程
文檔圖像預處理流程（可選）
表格識別流程（可選）
印章識別流程（可選）
公式識別流程（可選）

你可以通過以下命令快速體驗 PP-StructureV3 流程：

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --device gpu:0

結果將打印到終端。如果指定了 save_path，結果將保存到 save_path 下。

只需要幾行代碼就可以體驗該流程的推理。以 PP-StructureV3 流程為例：

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    use_doc_orientation_classify=False, # 使用 use_doc_orientation_classify 啟用/禁用文檔方向分類模型
    use_doc_unwarping=False,    # 使用 use_doc_unwarping 啟用/禁用文檔去畸變模塊
    use_textline_orientation=False, # 使用 use_textline_orientation 啟用/禁用文本行方向分類模型
    device="gpu:0", # 使用 device 指定使用 GPU 進行模型推理
    )
output = pipeline.predict(".mG4tnwfrvECoFMu-S9mxo.png")
for res in output:
    res.print() # 打印結構化預測輸出
    res.save_to_json(save_path="output") ## 以 JSON 格式保存當前圖像的結構化結果
    res.save_to_markdown(save_path="output") ## 以 Markdown 格式保存當前圖像的結果