RT-DETR-L_wireless_table_cell_detオープンソースの表検出モデル - 表のセルを正確に位置特定してマーキング

ホーム

RT DETR L Wireless Table Cell Det

PaddlePaddleによって開発

RT-DETR-L_wireless_table_cell_detは高精度の表セル検出モデルで、表認識タスク用に設計されており、表画像内の各セル領域を正確に位置決めしてマーキングすることができます。

文字認識複数言語対応オープンソースライセンス:Apache-2.0 #表セル検出 #高精度位置決め #マルチモーダル推論

ダウンロード数 1,144

リリース時間 : 6/6/2025

モデル概要

このモデルは表認識タスクの重要な構成要素で、表画像内の各セル領域を位置決めしてマーキングする役割を担っており、その性能は表認識プロセス全体の精度と効率に直接影響します。

モデル特徴

高精度検出

モデルは表セル検出タスクにおいて高い精度を持ち、Top1 Accが82.7%に達します。

マルチモード推論

GPUとCPUによる推論をサポートし、通常モードと高性能モードを提供し、さまざまなシーンのニーズを満たします。

豊富なパイプライン

汎用表認識V2パイプラインとPP-StructureV3パイプラインを提供し、複雑な表認識問題を解決できます。

モデル能力

表セル検出

マルチモード推論

表認識

使用事例

文書処理

表認識

画像から表情報を抽出し、HTMLまたはExcel形式で出力します。

高精度で表構造と内容を認識します。

オフィス自動化

経費精算書処理

経費精算書内の表情報を自動的に認識して抽出します。

オフィス作業の効率を向上させ、手入力のエラーを減らします。

🚀 RT-DETR-L_wireless_table_cell_det

表のセル検出モジュールは表認識タスクの重要な構成要素で、表画像内の各セル領域を特定してマークする役割を担っています。その性能は、表認識プロセス全体の精度と効率に直接影響を与えます。

🚀 クイックスタート

このプロジェクトでは、表のセル検出モデル RT-DETR-L_wireless_table_cell_det を提供しています。また、関連するインストール手順、使用例、およびさまざまな使用パイプラインについても紹介し、表認識タスクをすぐに始められるようにサポートしています。

✨ 主な機能

高精度検出：RT-DETR-L_wireless_table_cell_det モデルは、表のセル検出タスクにおいて高い精度を誇り、Top1 Acc が 82.7% に達しています。
多モード推論：GPU および CPU での推論をサポートし、通常モードと高性能モードを提供しており、さまざまなシーンのニーズに対応しています。
豊富なパイプライン：汎用表認識 V2 パイプラインと PP-StructureV3 パイプラインを提供し、複雑な表認識問題を解決できます。

📦 インストール

1. PaddlePaddle のインストール

以下のコマンドを参考に、pip を使用して PaddlePaddle をインストールしてください。

# CUDA11.8 用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# CUDA12.6 用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# CPU 用
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddle のインストールの詳細については、PaddlePaddle 公式サイトを参照してください。

2. PaddleOCR のインストール

PyPI から最新バージョンの PaddleOCR 推論パッケージをインストールします。

python -m pip install paddleocr

💻 使用例

基本的な使用法

以下のコマンドを使用して、すぐにモデルの機能を体験できます。

paddleocr table_cells_detection \
    --model_name RT-DETR-L_wireless_table_cell_det \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/6rfhb-CXOHowonjpBsaUJ.png

また、表分類モジュールのモデル推論をあなたのプロジェクトに統合することもできます。以下のコードを実行する前に、サンプル画像をローカルにダウンロードしてください。

from paddleocr import TableCellsDetection
model = TableCellsDetection(model_name="RT-DETR-L_wireless_table_cell_det")
output = model.predict("6rfhb-CXOHowonjpBsaUJ.png", threshold=0.3, batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_img("./output/")
    res.save_to_json("./output/res.json")

実行後の結果は次のとおりです。

{'res': {'input_path': '6rfhb-CXOHowonjpBsaUJ.png', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'cell', 'score': 0.9398849606513977, 'coordinate': [54.36941, 112.458046, 199.20259, 148.8335]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9389436841011047, 'coordinate': [54.376297, 38.66652, 200.09431, 75.04275]}, {'cls_id': 0, 'label': 'cell', 'score': 0.93695068359375, 'coordinate': [54.526768, 75.07727, 199.69261, 112.47577]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9276502132415771, 'coordinate': [256.82742, 112.23729, 327.20367, 148.69609]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9260919690132141, 'coordinate': [392.2286, 112.35808, 494.87323, 148.67969]}, {'cls_id': 0, 'label': 'cell', 'score': 0.926089882850647, 'coordinate': [55.078747, 148.77213, 198.78673, 181.62665]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9243109822273254, 'coordinate': [256.32922, 74.816475, 327.04968, 112.294014]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9232685565948486, 'coordinate': [54.62298, 6.616625, 199.83049, 38.849678]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9232298135757446, 'coordinate': [327.01968, 112.26065, 392.36826, 148.74333]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9225671291351318, 'coordinate': [256.76163, 39.040295, 326.9102, 74.86264]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9212655425071716, 'coordinate': [326.59286, 74.8661, 392.7218, 112.223015]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9207153916358948, 'coordinate': [392.2682, 74.9181, 494.8996, 112.21204]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9201209545135498, 'coordinate': [393.05807, 39.280144, 494.52887, 74.76607]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9167036414146423, 'coordinate': [326.6303, 38.908886, 392.46747, 74.80093]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9165226817131042, 'coordinate': [198.91599, 112.36962, 256.72226, 148.70464]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9159488081932068, 'coordinate': [200.06506, 38.73822, 256.86224, 74.968956]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9144055843353271, 'coordinate': [199.15344, 74.948166, 256.92688, 112.3458]}, {'cls_id': 0, 'label': 'cell', 'score': 0.909517228603363, 'coordinate': [256.9021, 148.65999, 327.34952, 180.787]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9079439043998718, 'coordinate': [392.5967, 148.63753, 494.56372, 180.72824]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9076585173606873, 'coordinate': [393.64462, 6.3321157, 494.12646, 38.97421]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9043015837669373, 'coordinate': [256.7985, 6.373327, 326.6927, 39.124607]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9015249609947205, 'coordinate': [327.21558, 148.66805, 392.69656, 180.74384]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8990758061408997, 'coordinate': [199.04855, 6.3791466, 256.9587, 38.893078]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8976367712020874, 'coordinate': [326.987, 6.264301, 393.08954, 39.058624]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8959962129592896, 'coordinate': [198.89633, 148.7314, 256.86224, 181.1719]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8942931294441223, 'coordinate': [7.233109, 112.34024, 55.069206, 148.63686]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8866638541221619, 'coordinate': [7.6031237, 75.04754, 54.86649, 112.31445]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8835263848304749, 'coordinate': [7.8346314, 38.471584, 54.338577, 75.0842]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8768432140350342, 'coordinate': [6.3656106, 148.65721, 55.30119, 181.48982]}, {'cls_id': 0, 'label': 'cell', 'score': 0.8766786456108093, 'coordinate': [8.270618, 6.590586, 54.000782, 38.58467]}]}}

可視化画像は次のとおりです。 image/jpeg 使用コマンドとパラメータの詳細については、ドキュメントを参照してください。

高度な使用法

単一のモデルの能力には限界がありますが、複数のモデルで構成されるパイプラインは、実際のシーンでの難題を解決するための強力な能力を提供します。

汎用表認識 V2 パイプライン

汎用表認識 V2 パイプラインは、表認識タスクを解決するために使用され、画像から情報を抽出し、HTML または Excel 形式で出力します。このパイプラインには 8 つのモジュールが含まれています。

表分類モジュール
表構造認識モジュール
表のセル検出モジュール
テキスト検出モジュール
テキスト認識モジュール
レイアウト領域検出モジュール（オプション）
文書画像方向分類モジュール（オプション）
テキスト画像歪み除去モジュール（オプション）以下のコマンドを実行すると、汎用表認識 V2 パイプラインをすぐに体験できます。

paddleocr table_recognition_v2 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png  \
    --use_doc_orientation_classify False  \
    --use_doc_unwarping False \
    --save_path ./output \
    --device gpu:0

結果はターミナルに表示されます。

{'res': {'input_path': 'mabagznApI1k9R8qFoTLc.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 8, 'label': 'table', 'score': 0.86655592918396, 'coordinate': [0.0125130415, 0.41920784, 1281.3737, 585.3884]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 0.13052222, ..., 73.08310249]), array([104.43082511, ...,  73.27777413]), array([319.39041221, ...,  73.30439308]), array([424.2436837 , ...,  73.44736794]), array([580.75836265, ...,  73.24003914]), array([723.04370201, ...,  73.22717598]), array([984.67315757, ...,  73.20420387]), array([1.25130415e-02, ..., 5.85419208e+02]), array([984.37072837, ..., 137.02281502]), array([984.26586998, ..., 201.22290352]), array([984.24017417, ..., 585.30775765]), array([1039.90606773, ...,  265.44664314]), array([1039.69549644, ...,  329.30540779]), array([1039.66546714, ...,  393.57319954]), array([1039.5122689 , ...,  457.74644783]), array([1039.55535972, ...,  521.73030403]), array([1039.58612144, ...,  585.09468392])], 'pred_html': '<html><body><table><tbody><tr><td>部门</td><td></td><td>报销人</td><td></td><td>报销事由</td><td></td><td colspan="2">批准人：</td></tr><tr><td colspan="6" rowspan="8"></td><td colspan="2">单据 张</td></tr><tr><td colspan="2">合计金额 元</td></tr><tr><td rowspan="6">其 中</td><td>车费票</td></tr><tr><td>火车费票</td></tr><tr><td>飞机票</td></tr><tr><td>旅住宿费</td></tr><tr><td>其他</td></tr><tr><td>补贴</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}}]}}

save_path を指定した場合、可視化結果は save_path ディレクトリに保存されます。可視化出力は次のとおりです。 image/jpeg コマンドライン方式はすぐに体験するのに便利です。プロジェクトに統合する場合は、数行のコードで実現できます。

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(
    use_doc_orientation_classify=False, # use_doc_orientation_classify を使用して文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False, # use_doc_unwarping を使用して文書歪み除去モジュールを有効/無効にする
)
# pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=True) # use_doc_orientation_classify を使用して文書方向分類モデルを使用するかどうかを指定する
# pipeline = TableRecognitionPipelineV2(use_doc_unwarping=True) # use_doc_unwarping を使用してテキスト画像歪み除去モジュールを使用するかどうかを指定する
# pipeline = TableRecognitionPipelineV2(device="gpu") # device を使用して GPU を使用してモデル推論を行う
output = pipeline.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png")
for res in output:
    res.print() ## 予測された構造化出力を表示する
    res.save_to_img("./output/")
    res.save_to_xlsx("./output/")
    res.save_to_html("./output/")
    res.save_to_json("./output/")

使用コマンドとパラメータの詳細については、ドキュメントを参照してください。

PP-StructureV3

レイアウト分析は、文書画像から構造化情報を抽出する技術です。PP-StructureV3 には以下の 6 つのモジュールが含まれています。

レイアウト検出モジュール
汎用 OCR パイプライン
文書画像前処理パイプライン（オプション）
表認識パイプライン（オプション）
印章認識パイプライン（オプション）
数式認識パイプライン（オプション）以下のコマンドを実行すると、PP-StructureV3 パイプラインをすぐに体験できます。

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --device gpu:0

結果はターミナルに表示されます。save_path を指定した場合、結果は save_path ディレクトリに保存されます。パイプライン推論を数行のコードで体験できます。PP-StructureV3 パイプラインを例に説明します。

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    use_doc_orientation_classify=False, # use_doc_orientation_classify を使用して文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False,    # use_doc_unwarping を使用して文書歪み除去モジュールを有効/無効にする
    use_textline_orientation=False, # use_textline_orientation を使用してテキスト行方向分類モデルを有効/無効にする
    device="gpu:0", # device を使用して GPU を使用してモデル推論を行う
    )
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
    res.print() # 構造化予測出力を表示する
    res.save_to_json(save_path="output") ## 現在の画像の構造化結果を JSON 形式で保存する
    res.save_to_markdown(save_path="output") ## 現在の画像の結果を Markdown 形式で保存する