Model Overview
Model Features
Model Capabilities
Use Cases
🚀 PP-OCRv5_mobile_det
PP-OCRv5_mobile_det is a text detection model from the PP-OCRv5_det series, developed by the PaddleOCR team. It efficiently and accurately detects text in various scenarios and languages, including handwriting, vertical, rotated, and curved text. With strong adaptability to complex layouts, different text sizes, and challenging backgrounds, it's suitable for practical applications like document analysis, license plate recognition, and scene text detection.
Handwritten Chinese | Handwritten English | Printed Chinese | Printed English | Traditional Chinese | Ancient Text | Japanese | General Scenario | Pinyin | Rotation | Distortion | Artistic Text | Average |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.744 | 0.777 | 0.905 | 0.910 | 0.823 | 0.581 | 0.727 | 0.721 | 0.575 | 0.647 | 0.827 | 0.525 | 0.770 |
🚀 Quick Start
📦 Installation
PaddlePaddle
Install PaddlePaddle using pip with the following commands:
# for CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# for CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# for CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
For detailed PaddlePaddle installation instructions, refer to the PaddlePaddle official website.
PaddleOCR
Install the latest PaddleOCR inference package from PyPI:
python -m pip install paddleocr
💻 Usage Examples
Basic Usage
You can quickly test the model with a single command:
paddleocr text_detection \
--model_name PP-OCRv5_mobile_det \
-i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png
You can also integrate the text detection model into your project. Download the sample image to your local machine before running the following code:
from paddleocr import TextDetection
model = TextDetection(model_name="PP-OCRv5_mobile_det")
output = model.predict(input="3ul2Rq4Sk5Cn-l69D695U.png", batch_size=1)
for res in output:
res.print()
res.save_to_img(save_path="./output/")
res.save_to_json(save_path="./output/res.json")
The result after running the code is as follows:
{'res': {'input_path': '/root/.paddlex/predict_input/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'dt_polys': array([[[ 105, 1431],
...,
[ 105, 1452]],
...,
[[ 353, 106],
...,
[ 353, 129]]], dtype=int16), 'dt_scores': [0.8306416015066644, 0.7603795581201811, ..., 0.8819806867477359]}}
The visualized image:
For detailed usage commands and parameter descriptions, refer to the Document.
Pipeline Usage
PP-OCRv5
The general OCR pipeline extracts text from images and outputs it as strings. It consists of five modules:
- Document Image Orientation Classification Module (Optional)
- Text Image Unwarping Module (Optional)
- Text Line Orientation Classification Module (Optional)
- Text Detection Module
- Text Recognition Module
Run the following command to quickly test the OCR pipeline:
paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
--text_detection_model_name PP-OCRv5_mobile_det \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation True \
--save_path ./output \
--device gpu:0
The results will be printed in the terminal:
{'res': {'input_path': 'printing_en/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[ 352, 105],
...,
[ 352, 128]],
...,
[[ 632, 1431],
...,
[ 632, 1447]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['Algorithms for the Markov Entropy Decomposition', 'Andrew J. Ferris and David Poulin', 'Département de Physique, Université de Sherbrooke, Québec, JI K 2R1, Canada', '(Dated: October 31, 2018)', 'The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for fi -', 'nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for', 'performing the required steps of the MED, principally solving a minimization problem with a preconditioned', 'arXiv:1212.1442v1 [cond-mat.stat-mech] 6 Dec 2012', "Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate", 'the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of', 'critical points and details of each phase. Although the method shares some qualitative similarities with exact-', 'diagonalization, we show the MED is both more accurate and significantly more flexible.', 'PACS numbers: 05.10.—a, 02.50.Ng, 03.67.–a, 74.40.Kb', 'I. INTRODUCTION', 'This approximation becomes exact in the case of a 1D quan-', 'tum (or classical) Markov chain [1O], and leads to an expo-', 'Although the equations governing quantum many-body', 'nential reduction of cost for exact entropy calculations when', 'systems are simple to write down, finding solutions for the', 'the global density matrix is a higher-dimensional Markov net-', 'majority of systems remains incredibly difficult. Modern', 'work state [12, 13].', 'physics finds itself in need of new tools to compute the emer-', 'The second approximation used in the MED approach is', 'gent behavior of large, many-body systems.', 'related to the N-representibility problem. Given a set of lo-', 'There has been a great variety of tools developed to tackle', 'cal but overlapping reduced density matrices { ρi }, it is a very', 'many-body problems, but in general, large 2D and 3D quan-', 'challenging problem to determine if there exists a global den.', 'tum systems remain hard to deal with. Most systems are', 'sity operator which is positive semi-definite and whose partial', 'thought to be non-integrable, so exact analytic solutions are', 'trace agrees with each ρi. This problem is QMA-hard (the', 'not usually expected. Direct numerical diagonalization can be', 'quantum analogue of NP) [14, 15], and is hopelessly diffi-', 'performed for relatively small systems — however the emer-', 'cult to enforce. Thus, the second approximation employed', 'gent behavior of a system in the thermodynamic limit may be', 'involves ignoring global consistency with a positive opera-', 'difficult to extract, especially in systems with large correlation', 'tor, while requiring local consistency on any overlapping re-', 'lengths. Monte Carlo approaches are technically exact (up to', 'gions between the ρi. At the zero-temperature limit, the MED', 'sampling error), but suffer from the so-called sign problem', 'approach becomes analogous to the variational nth-order re-', 'for fermionic, frustrated, or dynamical problems. Thus we are', 'duced density matrix approach, where positivity is enforced', 'limited to search for clever approximations to solve the ma-', 'on all reduced density matrices of size n [16–18].', 'jority of many-body problems.', 'The MED approach is an extremely flexible cluster method.', 'Over the past century, hundreds of such approximations', 'applicable to both translationally invariant systems of any di-', 'have been proposed, and we will mention just a few notable', 'mension in the thermodynamic limit, as well as finite systems', 'examples applicable to quantum lattice models. Mean-field', 'or systems without translational invariance (e.g. disordered', 'theory is simple and frequently arrives at the correct quali-', 'lattices, or harmonically trapped atoms in optical lattices).', 'tative description, but often fails when correlations are im-', 'The free energy given by MED is guaranteed to lower bound', 'portant. Density-matrix renormalisation group (DMRG) [1]', 'the true free energy, which in turn lower-bounds the ground', 'is efficient and extremely accurate at solving 1D problems,', 'state energy — thus providing a natural complement to varia-', 'but the computational cost grows exponentially with system', 'tional approaches which upper-bound the ground state energy.', 'size in two- or higher-dimensions [2, 3]. Related tensor-', 'The ability to provide a rigorous ground-state energy window', 'network techniques designed for 2D systems are still in their', 'is a powerful validation tool, creating a very compelling rea-', 'infancy [4–6]. Series-expansion methods [7] can be success-', 'son to use this approach.', 'ful, but may diverge or otherwise converge slowly, obscuring', 'In this paper we paper we present a pedagogical introduc-', 'the state in certain regimes. There exist a variety of cluster-', 'tion to MED, including numerical implementation issues and', 'based techniques, such as dynamical-mean-field theory [8]', 'applications to 2D quantum lattice models in the thermody-', 'and density-matrix embedding [9]', 'namic limit. In Sec. II. we giye a brief deriyation of the', 'Here we discuss the so-called Markov entropy decompo-', 'Markov entropy decomposition. Section III outlines a robust', 'sition (MED), recently proposed by Poulin & Hastings [10]', 'numerical strategy for optimizing the clusters that make up', '(and analogous to a slightly earlier classical algorithm [11]).', 'the decomposition. In Sec. IV we show how we can extend', 'This is a self-consistent cluster method for fi nite temperature', 'these algorithms to extract non-trivial information, such as', 'systems that takes advantage of an approximation of the (von', 'specific heat and susceptibilities. We present an application of', 'Neumann) entropy. In [10], it was shown that the entropy', 'the method to the spin-1/2 XXZ model on a 2D square lattice', 'per site can be rigorously upper bounded using only local in-', 'in Sec. V, describing how to characterize the phase diagram', 'formation — a local, reduced density matrix on N sites, say.', 'and determine critical points, before concluding in Sec. VI.'], 'rec_scores': array([0.99388635, ..., 0.99304372]), 'rec_polys': array([[[ 352, 105],
...,
[ 352, 128]],
...,
[[ 632, 1431],
...,
[ 632, 1447]]], dtype=int16), 'rec_boxes': array([[ 352, ..., 128],
...,
[ 632, ..., 1447]], dtype=int16)}}
If save_path
is specified, the visualization results will be saved in that directory. The visualization output:
For project integration, you can use the following code:
from paddleocr import PaddleOCR
ocr = PaddleOCR(
text_detection_model_name="PP-OCRv5_mobile_det",
use_doc_orientation_classify=False, # Use use_doc_orientation_classify to enable/disable document orientation classification model
use_doc_unwarping=False, # Use use_doc_unwarping to enable/disable document unwarping module
use_textline_orientation=True, # Use use_textline_orientation to enable/disable textline orientation classification model
device="gpu:0", # Use device to specify GPU for model inference
)
result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png")
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
The default text detection model in the pipeline is PP-OCRv5_server_det
. You need to specify PP-OCRv5_mobile_det
using the text_detection_model_name
argument. You can also use a local model file with the text_detection_model_dir
argument. For detailed usage commands and parameter descriptions, refer to the Document.
PP-StructureV3
Layout analysis extracts structured information from document images. PP-StructureV3 includes six modules:
- Layout Detection Module
- General OCR Pipeline
- Document Image Preprocessing Pipeline (Optional)
- Table Recognition Pipeline (Optional)
- Seal Recognition Pipeline (Optional)
- Formula Recognition Pipeline (Optional)
Run the following command to quickly test the PP-StructureV3 pipeline:
paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
--text_detection_model_name PP-OCRv5_mobile_det \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--device gpu:0
The results will be printed in the terminal. If save_path
is specified, the results will be saved there. The predicted markdown visualization:
You can use the following code to test the pipeline:
from paddleocr import PPStructureV3
pipeline = PPStructureV3(
text_detection_model_name="PP-OCRv5_mobile_det",
use_doc_orientation_classify=False, # Use use_doc_orientation_classify to enable/disable document orientation classification model
use_doc_unwarping=False, # Use use_doc_unwarping to enable/disable document unwarping module
use_textline_orientation=False, # Use use_textline_orientation to enable/disable textline orientation classification model
device="gpu:0", # Use device to specify GPU for model inference
)
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
res.print() # Print the structured prediction output
res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
The default text detection model in the pipeline is PP-OCRv5_server_det
. You need to specify PP-OCRv5_mobile_det
using the text_detection_model_name
argument. You can also use a local model file with the text_detection_model_dir
argument. For detailed usage commands and parameter descriptions, refer to the Document.
📚 Documentation
📄 License
This project is licensed under the Apache-2.0 license.








