PP-OCRv5_mobile_detオープンソーステキスト検出モデル - 多言語多シーンでの高効率な文字検出

PP OCRv5 Mobile Det

PaddlePaddleによって開発

PP-OCRv5_mobile_detはPaddleOCRチームが開発した最新世代の軽量級テキスト検出モデルで、多言語、多シーンでの効率的なテキスト検出をサポートします。

文字認識複数言語対応オープンソースライセンス:Apache-2.0 #多シーンテキスト検出 #多言語対応 #軽量化デプロイ

ダウンロード数 556

リリース時間 : 6/4/2025

モデル概要

このモデルは、手書き、垂直、回転、湾曲したテキストを含む多種のシーンでのテキスト検出を効率的かつ正確にサポートします。サポートする言語には、簡体字中国語、繁体字中国語、英語、日本語などがあります。文書分析、ナンバープレート認識、シーンテキスト検出などの実際のアプリケーションに適しています。

モデル特徴

多シーン適応

複雑なレイアウト、さまざまなサイズのテキスト、難しい背景を安定して処理できます。

多言語対応

簡体字中国語、繁体字中国語、英語、日本語などの多言語をサポートします。

高効率軽量

モバイル端向けに最適化された軽量モデルで、高性能を維持しながら計算リソースの要求を削減します。

複雑なテキスト処理

手書き、垂直、回転、湾曲などの特殊なテキスト形式を処理できます。

モデル能力

テキスト検出

手書きテキスト認識

印刷テキスト認識

多言語テキスト検出

回転テキスト検出

湾曲テキスト検出

使用事例

文書処理

文書分析

スキャン文書または写真からテキスト領域を抽出します。

文書内のテキストの位置と方向を正確に認識します。

シーンテキスト認識

ナンバープレート認識

車両画像内のナンバープレートの位置を検出します。

ナンバープレート領域を正確に囲み、後続の認識の準備をします。

街景テキスト認識

街景写真から看板、道路標識などのテキスト情報を抽出します。

複雑な背景下でテキスト領域を正確に認識します。

特殊テキスト処理

手書きノート認識

手書きノートの写真からテキスト領域を抽出します。

手書きテキストの位置を正確に認識します。

古書デジタル化

古書のスキャン画像からテキスト領域を特定します。

繁体字中国語と古文のテキスト検出をサポートします。

🚀 PP-OCRv5_mobile_det

PP-OCRv5_mobile_detは、PaddleOCRチームによって開発された最新世代のテキスト検出モデルであるPP-OCRv5_detシリーズの一つです。このモデルは、手書き、垂直、回転、湾曲したテキストなど、様々なシーンでのテキスト検出を効率的かつ正確にサポートします。サポートされる言語には、簡体字中国語、繁体字中国語、英語、日本語などが含まれます。主な特徴は、複雑なレイアウト、様々なサイズのテキスト、および難しい背景を安定して処理できることで、文書分析、ナンバープレート認識、シーンテキスト検出などの実際のアプリケーションに適しています。主要な精度指標は以下の通りです。

手書き中国語	手書き英語	印刷中国語	印刷英語	繁体字中国語	古文	日本語	一般的なシーン	ピンイン	回転テキスト	歪んだテキスト	アート文字	平均
0.744	0.777	0.905	0.910	0.823	0.581	0.727	0.721	0.575	0.647	0.827	0.525	0.770

🚀 クイックスタート

📦 インストール

1. PaddlePaddleのインストール

以下のコマンドを参考に、pipを使用してPaddlePaddleをインストールしてください。

# CUDA11.8用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# CUDA12.6用
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# CPU用
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

PaddlePaddleのインストールに関する詳細情報は、PaddlePaddle公式サイトを参照してください。

2. PaddleOCRのインストール

PyPIから最新バージョンのPaddleOCR推論パッケージをインストールします。

python -m pip install paddleocr

💻 使用例

基本的な使用法

1つのコマンドで機能をすぐに試すことができます。

paddleocr text_detection \
    --model_name PP-OCRv5_mobile_det \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png

また、テキスト検出モジュールのモデル推論をあなたのプロジェクトに統合することもできます。以下のコードを実行する前に、サンプル画像をローカルにダウンロードしてください。

from paddleocr import TextDetection
model = TextDetection(model_name="PP-OCRv5_mobile_det")
output = model.predict(input="3ul2Rq4Sk5Cn-l69D695U.png", batch_size=1)
for res in output:
    res.print()
    res.save_to_img(save_path="./output/")
    res.save_to_json(save_path="./output/res.json")

実行後の結果は以下の通りです。

{'res': {'input_path': '/root/.paddlex/predict_input/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'dt_polys': array([[[ 105, 1431],
        ...,
        [ 105, 1452]],

       ...,

       [[ 353,  106],
        ...,
        [ 353,  129]]], dtype=int16), 'dt_scores': [0.8306416015066644, 0.7603795581201811, ..., 0.8819806867477359]}}

可視化された画像は以下の通りです。 image/jpeg 使用コマンドとパラメータの詳細については、ドキュメントを参照してください。

高度な使用法

単一のモデルの能力には限界がありますが、複数のモデルから構成されるパイプラインは、実際のシーンでの難題を解決するための強力な能力を提供します。

PP-OCRv5

汎用OCRパイプラインは、テキスト認識タスクを解決するために使用され、画像からテキスト情報を抽出し、文字列形式で出力します。このパイプラインには、以下の5つのモジュールが含まれます。

文書画像方向分類モジュール（オプション）
テキスト画像矯正モジュール（オプション）
テキスト行方向分類モジュール（オプション）
テキスト検出モジュール
テキスト認識モジュール

以下のコマンドを実行して、OCRパイプラインをすぐに試してみましょう。

paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
    --text_detection_model_name PP-OCRv5_mobile_det \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation True \
    --save_path ./output \
    --device gpu:0

結果はターミナルに表示されます。

{'res': {'input_path': 'printing_en/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[ 352,  105],
        ...,
        [ 352,  128]],

       ...,

       [[ 632, 1431],
        ...,
        [ 632, 1447]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['Algorithms for the Markov Entropy Decomposition', 'Andrew J. Ferris and David Poulin', 'Département de Physique, Université de Sherbrooke, Québec, JI K 2R1, Canada', '(Dated: October 31, 2018)', 'The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for fi -', 'nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for', 'performing the required steps of the MED, principally solving a minimization problem with a preconditioned', 'arXiv:1212.1442v1 [cond-mat.stat-mech] 6 Dec 2012', "Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate", 'the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of', 'critical points and details of each phase. Although the method shares some qualitative similarities with exact-', 'diagonalization, we show the MED is both more accurate and significantly more flexible.', 'PACS numbers: 05.10.—a, 02.50.Ng, 03.67.–a, 74.40.Kb', 'I. INTRODUCTION', 'This approximation becomes exact in the case of a 1D quan-', 'tum (or classical) Markov chain [1O], and leads to an expo-', 'Although the equations governing quantum many-body', 'nential reduction of cost for exact entropy calculations when', 'systems are simple to write down, finding solutions for the', 'the global density matrix is a higher-dimensional Markov net-', 'majority of systems remains incredibly difficult. Modern', 'work state [12, 13].', 'physics finds itself in need of new tools to compute the emer-', 'The second approximation used in the MED approach is', 'gent behavior of large, many-body systems.', 'related to the N-representibility problem. Given a set of lo-', 'There has been a great variety of tools developed to tackle', 'cal but overlapping reduced density matrices { ρi }, it is a very', 'many-body problems, but in general, large 2D and 3D quan-', 'challenging problem to determine if there exists a global den.', 'tum systems remain hard to deal with. Most systems are', 'sity operator which is positive semi-definite and whose partial', 'thought to be non-integrable, so exact analytic solutions are', 'trace agrees with each ρi. This problem is QMA-hard (the', 'not usually expected. Direct numerical diagonalization can be', 'quantum analogue of NP) [14, 15], and is hopelessly diffi-', 'performed for relatively small systems — however the emer-', 'cult to enforce. Thus, the second approximation employed', 'gent behavior of a system in the thermodynamic limit may be', 'involves ignoring global consistency with a positive opera-', 'difficult to extract, especially in systems with large correlation', 'tor, while requiring local consistency on any overlapping re-', 'lengths. Monte Carlo approaches are technically exact (up to', 'gions between the ρi. At the zero-temperature limit, the MED', 'sampling error), but suffer from the so-called sign problem', 'approach becomes analogous to the variational nth-order re-', 'for fermionic, frustrated, or dynamical problems. Thus we are', 'duced density matrix approach, where positivity is enforced', 'limited to search for clever approximations to solve the ma-', 'on all reduced density matrices of size n [16–18].', 'jority of many-body problems.', 'The MED approach is an extremely flexible cluster method.', 'Over the past century, hundreds of such approximations', 'applicable to both translationally invariant systems of any di-', 'have been proposed, and we will mention just a few notable', 'mension in the thermodynamic limit, as well as finite systems', 'examples applicable to quantum lattice models. Mean-field', 'or systems without translational invariance (e.g. disordered', 'theory is simple and frequently arrives at the correct quali-', 'lattices, or harmonically trapped atoms in optical lattices).', 'tative description, but often fails when correlations are im-', 'The free energy given by MED is guaranteed to lower bound', 'portant. Density-matrix renormalisation group (DMRG) [1]', 'the true free energy, which in turn lower-bounds the ground', 'is efficient and extremely accurate at solving 1D problems,', 'state energy — thus providing a natural complement to varia-', 'but the computational cost grows exponentially with system', 'tional approaches which upper-bound the ground state energy.', 'size in two- or higher-dimensions [2, 3]. Related tensor-', 'The ability to provide a rigorous ground-state energy window', 'network techniques designed for 2D systems are still in their', 'is a powerful validation tool, creating a very compelling rea-', 'infancy [4–6]. Series-expansion methods [7] can be success-', 'son to use this approach.', 'ful, but may diverge or otherwise converge slowly, obscuring', 'In this paper we paper we present a pedagogical introduc-', 'the state in certain regimes. There exist a variety of cluster-', 'tion to MED, including numerical implementation issues and', 'based techniques, such as dynamical-mean-field theory [8]', 'applications to 2D quantum lattice models in the thermody-', 'and density-matrix embedding [9]', 'namic limit. In Sec. II. we giye a brief deriyation of the', 'Here we discuss the so-called Markov entropy decompo-', 'Markov entropy decomposition. Section III outlines a robust', 'sition (MED), recently proposed by Poulin & Hastings [10]', 'numerical strategy for optimizing the clusters that make up', '(and analogous to a slightly earlier classical algorithm [11]).', 'the decomposition. In Sec. IV we show how we can extend', 'This is a self-consistent cluster method for fi nite temperature', 'these algorithms to extract non-trivial information, such as', 'systems that takes advantage of an approximation of the (von', 'specific heat and susceptibilities. We present an application of', 'Neumann) entropy. In [10], it was shown that the entropy', 'the method to the spin-1/2 XXZ model on a 2D square lattice', 'per site can be rigorously upper bounded using only local in-', 'in Sec. V, describing how to characterize the phase diagram', 'formation — a local, reduced density matrix on N sites, say.', 'and determine critical points, before concluding in Sec. VI.'], 'rec_scores': array([0.99388635, ..., 0.99304372]), 'rec_polys': array([[[ 352,  105],
        ...,
        [ 352,  128]],

       ...,

       [[ 632, 1431],
        ...,
        [ 632, 1447]]], dtype=int16), 'rec_boxes': array([[ 352, ...,  128],
       ...,
       [ 632, ..., 1447]], dtype=int16)}}

save_pathを指定した場合、可視化結果はsave_pathディレクトリに保存されます。可視化出力は以下の通りです。 image/jpeg

コマンドライン方式は、すぐに試すのに適しています。プロジェクトへの統合には、数行のコードで済みます。

from paddleocr import PaddleOCR  

ocr = PaddleOCR(
    text_detection_model_name="PP-OCRv5_mobile_det",
    use_doc_orientation_classify=False, # use_doc_orientation_classifyを使用して、文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False, # use_doc_unwarpingを使用して、文書矯正モジュールを有効/無効にする
    use_textline_orientation=True, # use_textline_orientationを使用して、テキスト行方向分類モデルを有効/無効にする
    device="gpu:0", # deviceを使用して、モデル推論に使用するGPUを指定する
)
result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png")  
for res in result:  
    res.print()  
    res.save_to_img("output")  
    res.save_to_json("output")

パイプラインでデフォルトで使用されるモデルはPP-OCRv5_server_detであるため、パラメータtext_detection_model_nameを使用してPP-OCRv5_mobile_detを指定する必要があります。また、パラメータtext_detection_model_dirを使用して、ローカルのモデルファイルを使用することもできます。使用コマンドとパラメータの詳細については、ドキュメントを参照してください。

PP-StructureV3

レイアウト分析は、文書画像から構造化情報を抽出する技術です。PP-StructureV3には、以下の6つのモジュールが含まれます。

レイアウト検出モジュール
汎用OCRパイプライン
文書画像前処理パイプライン（オプション）
表認識パイプライン（オプション）
印鑑認識パイプライン（オプション）
数式認識パイプライン（オプション）

以下のコマンドを実行して、PP-StructureV3パイプラインをすぐに試してみましょう。

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --text_detection_model_name PP-OCRv5_mobile_det \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --device gpu:0

結果はターミナルに表示されます。save_pathを指定した場合、結果はsave_pathディレクトリに保存されます。予測されたMarkdownの可視化結果は以下の通りです。 image/jpeg

数行のコードでパイプラインの推論を試すことができます。PP-StructureV3パイプラインを例に挙げます。

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    text_detection_model_name="PP-OCRv5_mobile_det",
    use_doc_orientation_classify=False, # use_doc_orientation_classifyを使用して、文書方向分類モデルを有効/無効にする
    use_doc_unwarping=False,    # use_doc_unwarpingを使用して、文書矯正モジュールを有効/無効にする
    use_textline_orientation=False, # use_textline_orientationを使用して、テキスト行方向分類モデルを有効/無効にする
    device="gpu:0", # deviceを使用して、モデル推論に使用するGPUを指定する
    )
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
    res.print() # 構造化予測出力を印刷する
    res.save_to_json(save_path="output") ## 現在の画像の構造化結果をJSON形式で保存する
    res.save_to_markdown(save_path="output") ## 現在の画像の結果をMarkdown形式で保存する