RT-DETR-L_wired_table_cell_detオープンソースモデル - 表画像のセル領域を正確に位置特定してマーク

ホーム

RT DETR L Wired Table Cell Det

PaddlePaddleによって開発

RT - DETR - L_wired_table_cell_detは表認識タスクの重要なモジュールで、主に表画像内の各セル領域を位置決めしてマーキングする役割を担います。

文字認識複数言語対応オープンソースライセンス:Apache-2.0 #表セル検出 #高精度位置決め #構造化データ抽出

ダウンロード数 1,144

リリース時間 : 6/6/2025

モデル概要

このモデルは表認識タスクの重要なコンポーネントで、表画像内のセル領域を正確に位置決めしてマーキングするために使用され、表認識プロセス全体の精度と効率に直接影響を与えます。

モデル特徴

高精度検出

精度が82.7%に達し、表内の各セル領域を正確に位置決めできます。

高効率推論

GPUによる推論時間は通常モードでわずか35ms、高性能モードでは10.45msです。

軽量化

モデルの保存サイズはわずか124Mで、デプロイと使用が容易です。

統合容易

PaddleOCRの他のモジュールとシームレスに統合でき、完全な表認識ソリューションを形成します。

モデル能力

表セル検出

表構造認識

画像分析

使用事例

文書処理

財務諸表認識

財務諸表内のセル構造を自動的に認識し、データを抽出します。

表データを正確に抽出し、後続の分析と処理をサポートします。

請求書処理

請求書の表内の重要な情報領域を認識します。

請求書データ抽出の自動化レベルを向上させます。

オフィス自動化

PDF表変換

PDF文書内の表を構造化データに変換します。

文書のデジタル化処理を実現します。

🚀 RT-DETR-L_wired_table_cell_det

RT-DETR-L_wired_table_cell_det は表形式認識タスクにおける重要なモジュールで、主に表画像内の各セル領域を特定してマーキングする役割を担っています。その性能は、表形式認識全体の精度と効率に直接影響を与えます。

🚀 クイックスタート

インストール

1. PaddlePaddle のインストール

以下のコマンドを参考に、pip を使用して PaddlePaddle をインストールしてください：

# for CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# for CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# for CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddle のインストールの詳細については、PaddlePaddle 公式サイトを参照してください。

2. PaddleOCR のインストール

PyPI から最新バージョンの PaddleOCR 推論パッケージをインストールします：

python -m pip install paddleocr

モデルの使用

単一コマンドでの機能体験

単一コマンドを使用して、機能をすぐに体験できます：

paddleocr table_cells_detection \
    --model_name RT-DETR-L_wired_table_cell_det \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/JUU_5wJWVo4PcmJhSdIo3.png

プロジェクトへの統合

表分類モジュールのモデル推論を、あなたのプロジェクトに統合することもできます。以下のコードを実行する前に、サンプル画像をローカルにダウンロードしてください。

from paddleocr import TableCellsDetection
model = TableCellsDetection(model_name="RT-DETR-L_wired_table_cell_det")
output = model.predict("JUU_5wJWVo4PcmJhSdIo3.png", threshold=0.3, batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_img("./output/")
    res.save_to_json("./output/res.json")

実行後、以下のような結果が得られます：

{'res': {'input_path': 'JUU_5wJWVo4PcmJhSdIo3.png', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'cell', 'score': 0.9719462394714355, 'coordinate': [98.776054, 48.676155, 235.74197, 94.76812]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9706293344497681, 'coordinate': [235.65723, 48.66303, 473.31378, 94.746185]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9692592620849609, 'coordinate': [235.62718, 164.7009, 473.3329, 211.70175]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9682302474975586, 'coordinate': [98.61444, 164.80591, 235.63733, 211.60106]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9662815928459167, 'coordinate': [1.914098, 48.64288, 98.82235, 94.75366]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9643649458885193, 'coordinate': [1.8260963, 164.74123, 98.64024, 211.56848]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9605159759521484, 'coordinate': [98.783226, 117.873886, 235.74089, 141.91118]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9604074358940125, 'coordinate': [98.77425, 94.79676, 235.80171, 117.937065]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9603073596954346, 'coordinate': [98.788315, 1.8037335, 235.8512, 24.844206]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9592577815055847, 'coordinate': [235.70949, 94.7883, 473.3138, 117.90771]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9591122269630432, 'coordinate': [98.85015, 24.80603, 235.73082, 48.770897]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9586214423179626, 'coordinate': [235.62253, 1.8327671, 473.30493, 24.799725]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9583646059036255, 'coordinate': [235.7168, 117.81723, 473.26074, 141.87694]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9580551385879517, 'coordinate': [98.747986, 141.79, 235.71774, 164.90057]}, {'cls_id': 0, 'label': 'cell', 'score': 0.957258939743042, 'coordinate': [235.6782, 24.70515, 473.0595, 48.79732]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9568949937820435, 'coordinate': [1.8317447, 94.74939, 98.85935, 117.94785]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9563664793968201, 'coordinate': [1.8571337, 1.8207415, 98.98403, 24.901613]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9562588334083557, 'coordinate': [235.67096, 141.72911, 473.3746, 164.82388]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9557535648345947, 'coordinate': [1.922168, 117.84509, 98.85703, 141.85947]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9551460146903992, 'coordinate': [1.8364778, 141.7853, 98.83259, 164.88046]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9547295570373535, 'coordinate': [2.0152304, 24.793072, 98.84856, 48.75716]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9525823593139648, 'coordinate': [235.63931, 211.63988, 473.2472, 254.16182]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9454454779624939, 'coordinate': [98.62049, 211.4913, 235.57971, 254.40237]}, {'cls_id': 0, 'label': 'cell', 'score': 0.9410758018493652, 'coordinate': [1.9204835, 211.48651, 98.601524, 254.9897]}]}}

可視化画像は次の通りです： image/jpeg 使用コマンドとパラメータの詳細については、ドキュメントを参照してください。

パイプラインの使用

単一のモデルの能力には限界がありますが、複数のモデルから構成されるパイプラインは、実際のシーンでの難題を解決するための強力な能力を提供します。

汎用表形式認識 V2 パイプライン

汎用表形式認識 V2 パイプラインは、表形式認識タスクを解決するために使用され、画像から情報を抽出し、HTML または Excel 形式で出力します。このパイプラインには 8 つのモジュールが含まれています：

表分類モジュール
表構造認識モジュール
表セル検出モジュール
テキスト検出モジュール
テキスト認識モジュール
レイアウト領域検出モジュール（オプション）
文書画像方向分類モジュール（オプション）
テキスト画像歪み除去モジュール（オプション）

単一コマンドで汎用表形式認識 V2 パイプラインをすぐに体験できます：

paddleocr table_recognition_v2 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png  \
    --use_doc_orientation_classify False  \
    --use_doc_unwarping False \
    --save_path ./output \
    --device gpu:0

結果はターミナルに表示されます：

{'res': {'input_path': 'mabagznApI1k9R8qFoTLc.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_ocr_model': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 8, 'label': 'table', 'score': 0.86655592918396, 'coordinate': [0.0125130415, 0.41920784, 1281.3737, 585.3884]}]}, 'overall_ocr_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'text_det_params': {'limit_side_len': 960, 'limit_type': 'max', 'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 2.0}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}, 'table_res_list': [{'cell_box_list': [array([ 0.13052222, ..., 73.08310249]), array([104.43082511, ...,  73.27777413]), array([319.39041221, ...,  73.30439308]), array([424.2436837 , ...,  73.44736794]), array([580.75836265, ...,  73.24003914]), array([723.04370201, ...,  73.22717598]), array([984.67315757, ...,  73.20420387]), array([1.25130415e-02, ..., 5.85419208e+02]), array([984.37072837, ..., 137.02281502]), array([984.26586998, ..., 201.22290352]), array([984.24017417, ..., 585.30775765]), array([1039.90606773, ...,  265.44664314]), array([1039.69549644, ...,  329.30540779]), array([1039.66546714, ...,  393.57319954]), array([1039.5122689 , ...,  457.74644783]), array([1039.55535972, ...,  521.73030403]), array([1039.58612144, ...,  585.09468392])], 'pred_html': '<html><body><table><tbody><tr><td>部门</td><td></td><td>报销人</td><td></td><td>报销事由</td><td></td><td colspan="2">批准人：</td></tr><tr><td colspan="6" rowspan="8"></td><td colspan="2">单据 张</td></tr><tr><td colspan="2">合计金额 元</td></tr><tr><td rowspan="6">其 中</td><td>车费票</td></tr><tr><td>火车费票</td></tr><tr><td>飞机票</td></tr><tr><td>旅住宿费</td></tr><tr><td>其他</td></tr><tr><td>补贴</td></tr></tbody></table></body></html>', 'table_ocr_pred': {'rec_polys': array([[[   9,   21],
        ...,
        [   9,   59]],

       ...,

       [[1046,  536],
        ...,
        [1046,  573]]], dtype=int16), 'rec_texts': ['部门', '报销人', '报销事由', '批准人：', '单据', '张', '合计金额', '元', '车费票', '其', '火车费票', '飞机票', '中', '旅住宿费', '其他', '补贴'], 'rec_scores': array([0.99958128, ..., 0.99317062]), 'rec_boxes': array([[   9, ...,   59],
       ...,
       [1046, ...,  573]], dtype=int16)}}]}}

save_path を指定した場合、可視化結果は save_path 以下に保存されます。可視化出力は次の通りです： image/jpeg コマンドライン方式はすぐに体験するのに便利です。プロジェクトに統合する場合は、数行のコードで実現できます：

from paddleocr import TableRecognitionPipelineV2

pipeline = TableRecognitionPipelineV2(
    use_doc_orientation_classify=False, # use_doc_orientation_classify を使用して文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False, # use_doc_unwarping を使用して文書歪み除去モジュールを有効/無効にする
)
# pipeline = TableRecognitionPipelineV2(use_doc_orientation_classify=True) # use_doc_orientation_classify を使用して文書方向分類モデルを使用するかどうかを指定する
# pipeline = TableRecognitionPipelineV2(use_doc_unwarping=True) # use_doc_unwarping を使用してテキスト画像歪み除去モジュールを使用するかどうかを指定する
# pipeline = TableRecognitionPipelineV2(device="gpu") # device を使用して GPU でモデル推論を行う
output = pipeline.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mabagznApI1k9R8qFoTLc.png")
for res in output:
    res.print() ## 予測された構造化出力を表示する
    res.save_to_img("./output/")
    res.save_to_xlsx("./output/")
    res.save_to_html("./output/")
    res.save_to_json("./output/")

使用コマンドとパラメータの詳細については、ドキュメントを参照してください。

PP-StructureV3

レイアウト分析は、文書画像から構造化情報を抽出する技術です。PP-StructureV3 には以下の 6 つのモジュールが含まれています：

レイアウト検出モジュール
汎用 OCR パイプライン
文書画像前処理パイプライン（オプション）
表形式認識パイプライン（オプション）
印章認識パイプライン（オプション）
数式認識パイプライン（オプション）

単一コマンドで PP-StructureV3 パイプラインをすぐに体験できます：

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --device gpu:0

結果はターミナルに表示されます。save_path を指定した場合、結果は save_path 以下に保存されます。

数行のコードでパイプラインの推論を体験できます。PP-StructureV3 パイプラインを例に説明します：

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    use_doc_orientation_classify=False, # use_doc_orientation_classify を使用して文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False,    # use_doc_unwarping を使用して文書歪み除去モジュールを有効/無効にする
    use_textline_orientation=False, # use_textline_orientation を使用してテキスト行方向分類モデルを有効/無効にする
    device="gpu:0", # device を使用して GPU でモデル推論を行う
    )
output = pipeline.predict("SfxF0X4drBTNGnfFOtZij.png")
for res in output:
    res.print() # 構造化予測出力を表示する
    res.save_to_json(save_path="output") ## 現在の画像の構造化結果を JSON 形式で保存する
    res.save_to_markdown(save_path="output") ## 現在の画像の結果を Markdown 形式で保存する

使用コマンドとパラメータの詳細については、ドキュメントを参照してください。

✨ 主な機能

表セル検出モジュールは、表形式認識タスクの重要な構成要素で、表画像内の各セル領域を特定してマーキングする役割を担っています。その性能は、表形式認識全体の精度と効率に直接影響を与えます。このモジュールは通常、各セル領域の境界ボックスを出力し、それが表形式認識パイプラインに入力されてさらに処理されます。

モデル	正解率（%）	GPU 推論時間（ms） [通常モード / 高性能モード]	CPU 推論時間（ms） [通常モード / 高性能モード]	モデル保存サイズ（M）
RT-DETR-L_wired_table_cell_det	82.7	35.00 / 10.45	495.51 / 495.51	124M

注意：RT-DETR-L_wired_table_cell_det の正解率は、RT-DETR-L_wireless_table_cell_det との連合テストの結果に基づいています。

📦 インストール

PaddlePaddle のインストール

以下のコマンドを参考に、pip を使用して PaddlePaddle をインストールしてください：

# for CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# for CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# for CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddle のインストールの詳細については、PaddlePaddle 公式サイトを参照してください。

PaddleOCR のインストール

PyPI から最新バージョンの PaddleOCR 推論パッケージをインストールします：

python -m pip install paddleocr

💻 使用例

基本的な使用法

paddleocr table_cells_detection \
    --model_name RT-DETR-L_wired_table_cell_det \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/JUU_5wJWVo4PcmJhSdIo3.png

高度な使用法

from paddleocr import TableCellsDetection
model = TableCellsDetection(model_name="RT-DETR-L_wired_table_cell_det")
output = model.predict("JUU_5wJWVo4PcmJhSdIo3.png", threshold=0.3, batch_size=1)
for res in output:
    res.print(json_format=False)
    res.save_to_img("./output/")
    res.save_to_json("./output/res.json")