SLANet_plusオープンソース表構造認識モデル - 表画像を迅速に編集可能なHTML形式に変換

Slanet Plus

PaddlePaddleによって開発

SLANet_plusは表構造認識に使用されるモデルで、編集できない表画像を編集可能な表形式（HTMLなど）に変換でき、表認識システムで重要な役割を果たし、表認識の精度と効率を効果的に向上させることができます。

文字認識複数言語対応オープンソースライセンス:Apache-2.0 #表構造認識 #HTML変換 #多モジュール統合

ダウンロード数 1,121

リリース時間 : 6/6/2025

モデル概要

SLANet_plusは表構造認識に特化したディープラーニングモデルで、表の行、列、セルの位置を正確に認識し、編集できない表画像を編集可能なHTML形式に変換できます。このモデルは表認識システムで重要なサポートを提供し、様々な文書処理フローに統合できます。

モデル特徴

高精度の表構造認識

表の行、列、セルの位置を正確に認識し、編集できない表画像を編集可能なHTML形式に変換できます。

多モジュール統合パイプライン

汎用表認識V2パイプラインとPP - StructureV3パイプラインを提供し、表分類、構造認識、テキスト検出と認識などの複数のモジュールを統合しています。

高効率推論

モデルの保存サイズはわずか6.9Mで、GPUとCPUで良好な推論速度を持ち、GPUでの推論時間は約140msです。

エンドツーエンドのソリューション

画像入力から構造化出力までの完全なプロセスをサポートし、HTML、Excelなどの複数の形式を出力できます。

モデル能力

表構造認識

表画像変換

HTML形式出力

Excel形式出力

多モジュール協調処理

使用事例

文書処理

財務諸表認識

スキャンした財務諸表画像を編集可能なHTMLまたはExcel形式に変換します。

表構造を正確に認識し、元のデータ関係を保持します。

経費精算書処理

経費精算書の表情報を自動的に認識し、構造化して出力します。

認識精度は63.69%で、手動入力作業を大幅に削減できます。

データデジタル化

歴史文書のデジタル化

紙文書の表内容を編集可能なデジタル形式に変換します。

元の表構造を保持し、後続のデータ分析と処理を容易にします。

🚀 SLANet_plus

SLANet_plusは、表構造認識に使用されるモデルです。編集不可能な表画像を編集可能な表形式（HTMLなど）に変換することができ、表認識システムにおいて重要な役割を果たし、表認識の精度と効率を向上させます。

🚀 クイックスタート

依存関係のインストール

1. PaddlePaddleのインストール

以下のコマンドを参考に、pipを使用してPaddlePaddleをインストールしてください。

# CUDA11.8用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# CUDA12.6用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# CPU用
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddleのインストール詳細については、PaddlePaddle公式サイトを参照してください。

2. PaddleOCRのインストール

PyPIから最新バージョンのPaddleOCR推論パッケージをインストールします。

python -m pip install paddleocr

モデルの使用

単一コマンドでの機能体験

単一コマンドですぐに機能を体験できます。

paddleocr table_structure_recognition \
    --model_name SLANet_plus \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/6rfhb-CXOHowonjpBsaUJ.png

プロジェクトへの統合

表分類モジュールのモデル推論をあなたのプロジェクトに統合することもできます。以下のコードを実行する前に、サンプル画像をローカルにダウンロードしてください。

from paddleocr import TableStructureRecognition
model = TableStructureRecognition(model_name="SLANet_plus")
output = model.predict(input="UHf7jONQ3a18cszdL_Wuo.png", batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_json("./output/res.json")

実行後、以下のような結果が得られます。

{'res': {'input_path': '6rfhb-CXOHowonjpBsaUJ.png', 'page_index': None, 'bbox': [[1, 2, 64, 2, 64, 41, 1, 41], [52, 1, 199, 1, 198, 38, 51, 38], [182, 1, 253, 1, 254, 40, 184, 40], [248, 1, 323, 1, 324, 41, 249, 41], [314, 1, 384, 1, 385, 40, 315, 40], [389, 2, 493, 2, 493, 45, 388, 44], [2, 42, 50, 42, 50, 77, 2, 77], [65, 42, 176, 42, 175, 77, 64, 77], [187, 40, 251, 40, 249, 79, 185, 79], [252, 41, 319, 41, 319, 80, 251, 80], [318, 40, 379, 40, 380, 78, 318, 78], [385, 39, 497, 39, 497, 84, 384, 83], [2, 82, 50, 82, 50, 118, 2, 118], [63, 80, 182, 80, 181, 114, 62, 114], [189, 80, 250, 80, 249, 114, 187, 114], [253, 80, 319, 80, 319, 114, 252, 114], [320, 78, 378, 79, 378, 114, 320, 114], [395, 77, 496, 78, 496, 118, 394, 118], [2, 117, 49, 118, 50, 155, 2, 155], [65, 115, 180, 115, 179, 151, 64, 151], [191, 115, 249, 115, 248, 150, 189, 150], [254, 115, 318, 115, 318, 150, 253, 150], [321, 114, 377, 114, 378, 150, 321, 150], [396, 113, 495, 113, 495, 154, 394, 153], [1, 153, 56, 153, 57, 192, 1, 191], [68, 152, 175, 152, 175, 189, 67, 189], [189, 152, 249, 152, 249, 188, 188, 188], [252, 152, 317, 152, 318, 188, 252, 188], [320, 150, 377, 151, 378, 188, 321, 187], [393, 150, 494, 151, 494, 193, 391, 192]], 'structure': ['<html>', '<body>', '<table>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</table>', '</body>', '</html>'], 'structure_score': 0.99635947}}

コマンドとパラメータの詳細については、ドキュメントを参照してください。

パイプラインの使用

単一のモデルの能力には限界がありますが、複数のモデルで構成されるパイプラインは、現実のシーンでの難題を解決するための強力な能力を提供します。

汎用表認識V2パイプライン

汎用表認識V2パイプラインは、表認識タスクを解決するために使用され、画像から情報を抽出し、HTMLまたはExcel形式で出力します。パイプラインには8つのモジュールが含まれています。

表分類モジュール
表構造認識モジュール
表セル検出モジュール
テキスト検出モジュール
テキスト認識モジュール
レイアウト領域検出モジュール（オプション）
文書画像方向分類モジュール（オプション）
テキスト画像歪み補正モジュール（オプション）

単一のコマンドを実行して、デフォルトの設定で汎用表認識V2パイプラインをすぐに体験できます。このパイプラインは、SLANeXt_wiredとSLANeXt_wirelessを使用して表構造を予測します。

paddleocr table_recognition_v2 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png  \
    --use_doc_orientation_classify False  \
    --use_doc_unwarping False \
    --save_path ./output \
    --device gpu:0

結果はターミナルに表示されます。

{'res': {'input_path': 'mabagznApI1k9R8qFoTLc.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 8, 'label': 'table', 'score': 0.86655592918396, 'coordinate': [0.0125130415, 0.41920784, 1281.3737, 585.3884]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 0.13052222, ..., 73.08310249]), array([104.43082511, ...,  73.27777413]), array([319.39041221, ...,  73.30439308]), array([424.2436837 , ...,  73.44736794]), array([580.75836265, ...,  73.24003914]), array([723.04370201, ...,  73.22717598]), array([984.67315757, ...,  73.20420387]), array([1.25130415e-02, ..., 5.85419208e+02]), array([984.37072837, ..., 137.02281502]), array([984.26586998, ..., 201.22290352]), array([984.24017417, ..., 585.30775765]), array([1039.90606773, ...,  265.44664314]), array([1039.69549644, ...,  329.30540779]), array([1039.66546714, ...,  393.57319954]), array([1039.5122689 , ...,  457.74644783]), array([1039.55535972, ...,  521.73030403]), array([1039.58612144, ...,  585.09468392])], 'pred_html': '<html><body><table><tbody><tr><td>部门</td><td></td><td>报销人</td><td></td><td>报销事由</td><td></td><td colspan="2">批准人：</td></tr><tr><td colspan="6" rowspan="8"></td><td colspan="2">单据 张</td></tr><tr><td colspan="2">合计金额 元</td></tr><tr><td rowspan="6">其 中</td><td>车费票</td></tr><tr><td>火车费票</td></tr><tr><td>飞机票</td></tr><tr><td>旅住宿费</td></tr><tr><td>其他</td></tr><tr><td>补贴</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}}]}}

save_pathを指定した場合、可視化結果はsave_pathに保存されます。可視化出力は次の通りです。

image/jpeg

コマンドライン方式は、すぐに体験するのに便利です。プロジェクトに統合する場合は、数行のコードで実現できます。

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(
    use_doc_orientation_classify=False, # use_doc_orientation_classifyを使用して文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False, # use_doc_unwarpingを使用して文書歪み補正モジュールを有効/無効にする
)
# pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=True) # use_doc_orientation_classifyを使用して文書方向分類モデルを使用するかどうかを指定する
# pipeline = TableRecognitionPipelineV2(use_doc_unwarping=True) # use_doc_unwarpingを使用してテキスト画像歪み補正モジュールを使用するかどうかを指定する
# pipeline = TableRecognitionPipelineV2(device="gpu") # deviceを使用してGPUを使用してモデル推論を行う
output = pipeline.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png")
for res in output:
    res.print() ## 予測された構造化出力を印刷する
    res.save_to_img("./output/")
    res.save_to_xlsx("./output/")
    res.save_to_html("./output/")
    res.save_to_json("./output/")

SLANet_plusモデルを使用して表認識を行う場合は、モデル名を変更し、エンドツーエンドの予測モードを使用するだけです。

paddleocr table_recognition_v2 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png  \
    --use_doc_orientation_classify False  \
    --use_doc_unwarping False \
    --wired_table_structure_recognition_model_name SLANet_plus \ 
    --use_e2e_wired_table_rec_model True \
    --wireless_table_structure_recognition_model_name SLANet_plus \
    --use_e2e_wireless_table_rec_model True \
    --save_path ./output \
    --device gpu:0

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False, 
    wired_table_structure_recognition_model_name=SLANet_plus,  ## 有線表認識に使用
    wireless_table_structure_recognition_model_name=SLANet_plus,  ## 無線表認識に使用
)
output = pipeline.predict(
    "https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png",
    use_e2e_wired_table_rec_model=True,  ## 有線表認識に使用
    use_e2e_wireless_table_rec_model=True,  ## 無線表認識に使用
    )
for res in output:
    res.print() ## 予測された構造化出力を印刷する
    res.save_to_img("./output/")
    res.save_to_xlsx("./output/")
    res.save_to_html("./output/")
    res.save_to_json("./output/")

コマンドとパラメータの詳細については、ドキュメントを参照してください。

PP-StructureV3

レイアウト分析は、文書画像から構造化情報を抽出する技術です。PP-StructureV3には以下の6つのモジュールが含まれています。

レイアウト検出モジュール
汎用OCRパイプライン
文書画像前処理パイプライン（オプション）
表認識パイプライン（オプション）
印章認識パイプライン（オプション）
数式認識パイプライン（オプション）

単一のコマンドを実行して、PP-StructureV3パイプラインをすぐに体験できます。

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --wired_table_structure_recognition_model_name SLANet_plus \ 
    --use_e2e_wired_table_rec_model True \
    --wireless_table_structure_recognition_model_name SLANet_plus \
    --use_e2e_wireless_table_rec_model True \
    --use_textline_orientation False \
    --device gpu:0

結果はターミナルに表示されます。save_pathを指定した場合、結果はsave_pathに保存されます。

数行のコードでパイプライン推論を体験できます。PP-StructureV3パイプラインを例に説明します。

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    wired_table_structure_recognition_model_name=SLANet_plus,  ## 有線表認識に使用
    wireless_table_structure_recognition_model_name=SLANet_plus,  ## 無線表認識に使用
    use_doc_orientation_classify=False, # use_doc_orientation_classifyを使用して文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False,    # use_doc_unwarpingを使用して文書歪み補正モジュールを有効/無効にする
    use_textline_orientation=False, # use_textline_orientationを使用してテキスト行方向分類モデルを有効/無効にする
    device="gpu:0", # deviceを使用してGPUを使用してモデル推論を行う
    )
output = pipeline.predict(
    "mG4tnwfrvECoFMu-S9mxo.png",
    use_e2e_wired_table_rec_model=True,  ## 有線表認識に使用
    use_e2e_wireless_table_rec_model=True,  ## 無線表認識に使用
    )
for res in output:
    res.print() # 構造化予測出力を印刷する
    res.save_to_json(save_path="output") ## 現在の画像の構造化結果をJSON形式で保存する
    res.save_to_markdown(save_path="output") ## 現在の画像の結果をMarkdown形式で保存する

パイプラインでデフォルトで使用されるモデルはSLANeXt_wiredとSLANeXt_wirelessです。したがって、パラメータを指定してSLANet_plusに変更する必要があります。コマンドとパラメータの詳細については、ドキュメントを参照してください。

✨ 主な機能

表構造認識能力

表の行、列、セルの位置を正確に認識し、編集不可能な表画像を編集可能なHTML形式に変換し、表認識システムに重要なサポートを提供します。

多モジュール統合パイプライン

汎用表認識V2パイプラインとPP-StructureV3パイプラインを提供し、表分類、構造認識、テキスト検出と認識などの複数のモジュールを統合し、複雑な表認識タスクを解決することができます。

効率的な推論

モデルの保存サイズはわずか6.9Mで、GPUとCPUの両方で良好な推論速度を持ち、さまざまなシーンでの使用ニーズを満たすことができます。

📦 インストール

PaddlePaddleのインストール

# CUDA11.8用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# CUDA12.6用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# CPU用
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddleOCRのインストール

python -m pip install paddleocr

💻 使用例

基本的な使用法

単一コマンドでのモデル機能の体験

paddleocr table_structure_recognition \
    --model_name SLANet_plus \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/6rfhb-CXOHowonjpBsaUJ.png

プロジェクトへの統合

from paddleocr import TableStructureRecognition
model = TableStructureRecognition(model_name="SLANet_plus")
output = model.predict(input="UHf7jONQ3a18cszdL_Wuo.png", batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_json("./output/res.json")

高度な使用法

汎用表認識V2パイプラインの使用

paddleocr table_recognition_v2 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png  \
    --use_doc_orientation_classify False  \
    --use_doc_unwarping False \
    --save_path ./output \
    --device gpu:0

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False, 
    wired_table_structure_recognition_model_name=SLANet_plus,
    wireless_table_structure_recognition_model_name=SLANet_plus,
)
output = pipeline.predict(
    "https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png",
    use_e2e_wired_table_rec_model=True,
    use_e2e_wireless_table_rec_model=True,
    )
for res in output:
    res.print()
    res.save_to_img("./output/")
    res.save_to_xlsx("./output/")
    res.save_to_html("./output/")
    res.save_to_json("./output/")

PP-StructureV3パイプラインの使用

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --wired_table_structure_recognition_model_name SLANet_plus \ 
    --use_e2e_wired_table_rec_model True \
    --wireless_table_structure_recognition_model_name SLANet_plus \
    --use_e2e_wireless_table_rec_model True \
    --use_textline_orientation False \
    --device gpu:0

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    wired_table_structure_recognition_model_name=SLANet_plus,
    wireless_table_structure_recognition_model_name=SLANet_plus,
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    device="gpu:0",
    )
output = pipeline.predict(
    "mG4tnwfrvECoFMu-S9mxo.png",
    use_e2e_wired_table_rec_model=True,
    use_e2e_wireless_table_rec_model=True,
    )
for res in output:
    res.print()
    res.save_to_json(save_path="output")
    res.save_to_markdown(save_path="output")