PP-OCRv5_mobile_det Open-source Text Detection Model - Efficiently Detect Text in Multiple Languages and Scenarios

PP OCRv5 Mobile Det

Developed by PaddlePaddle

PP-OCRv5_mobile_det is the latest generation of lightweight text detection model developed by the PaddleOCR team, supporting efficient text detection in multiple languages and scenarios.

Text Recognition Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multi-scenario text detection #Multi-language support #Lightweight deployment

Downloads 556

Release Time : 6/4/2025

Model Overview

This model can efficiently and accurately support text detection in various scenarios, including handwritten, vertical, rotated, and curved text. The supported languages include Simplified Chinese, Traditional Chinese, English, and Japanese. It is suitable for practical applications such as document analysis, license plate recognition, and scene text detection.

Model Features

Multi-scenario adaptation

Can robustly handle complex layouts, texts of different sizes, and challenging backgrounds

Multi-language support

Supports multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese

Efficient and lightweight

A lightweight model optimized for mobile devices, reducing computational resource requirements while maintaining high performance

Complex text processing

Can handle special text forms such as handwritten, vertical, rotated, and curved text

Model Capabilities

Text detection

Handwritten text recognition

Printed text recognition

Multi-language text detection

Rotated text detection

Curved text detection

Use Cases

Document processing

Document analysis

Extract text regions from scanned documents or photos

Accurately identify the position and direction of text in the document

Scene text recognition

License plate recognition

Detect the position of license plates in vehicle images

Accurately frame the license plate area for subsequent recognition

Street view text recognition

Extract text information such as billboards and road signs from street view photos

Accurately identify text regions in complex backgrounds

Special text processing

Handwritten note recognition

Extract text regions from photos of handwritten notes

Accurately identify the position of handwritten text

Ancient book digitization

Locate text regions in scanned ancient books

Support text detection of Traditional Chinese and ancient Chinese

🚀 PP-OCRv5_mobile_det

PP-OCRv5_mobile_det is a text detection model from the PP-OCRv5_det series, developed by the PaddleOCR team. It efficiently and accurately detects text in various scenarios and languages, including handwriting, vertical, rotated, and curved text. With strong adaptability to complex layouts, different text sizes, and challenging backgrounds, it's suitable for practical applications like document analysis, license plate recognition, and scene text detection.

Handwritten Chinese	Handwritten English	Printed Chinese	Printed English	Traditional Chinese	Ancient Text	Japanese	General Scenario	Pinyin	Rotation	Distortion	Artistic Text	Average
0.744	0.777	0.905	0.910	0.823	0.581	0.727	0.721	0.575	0.647	0.827	0.525	0.770

🚀 Quick Start

📦 Installation

PaddlePaddle

Install PaddlePaddle using pip with the following commands:

# for CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# for CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# for CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

For detailed PaddlePaddle installation instructions, refer to the PaddlePaddle official website.

PaddleOCR

Install the latest PaddleOCR inference package from PyPI:

python -m pip install paddleocr

💻 Usage Examples

Basic Usage

You can quickly test the model with a single command:

paddleocr text_detection \
    --model_name PP-OCRv5_mobile_det \
    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png

You can also integrate the text detection model into your project. Download the sample image to your local machine before running the following code:

from paddleocr import TextDetection
model = TextDetection(model_name="PP-OCRv5_mobile_det")
output = model.predict(input="3ul2Rq4Sk5Cn-l69D695U.png", batch_size=1)
for res in output:
    res.print()
    res.save_to_img(save_path="./output/")
    res.save_to_json(save_path="./output/res.json")

The result after running the code is as follows:

{'res': {'input_path': '/root/.paddlex/predict_input/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'dt_polys': array([[[ 105, 1431],
        ...,
        [ 105, 1452]],

       ...,

       [[ 353,  106],
        ...,
        [ 353,  129]]], dtype=int16), 'dt_scores': [0.8306416015066644, 0.7603795581201811, ..., 0.8819806867477359]}}

The visualized image:

image/jpeg

For detailed usage commands and parameter descriptions, refer to the Document.

Pipeline Usage

PP-OCRv5

The general OCR pipeline extracts text from images and outputs it as strings. It consists of five modules:

Document Image Orientation Classification Module (Optional)
Text Image Unwarping Module (Optional)
Text Line Orientation Classification Module (Optional)
Text Detection Module
Text Recognition Module

Run the following command to quickly test the OCR pipeline:

paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
    --text_detection_model_name PP-OCRv5_mobile_det \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation True \
    --save_path ./output \
    --device gpu:0

The results will be printed in the terminal:

{'res': {'input_path': 'printing_en/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[ 352,  105],
        ...,
        [ 352,  128]],

       ...,

       [[ 632, 1431],
        ...,
        [ 632, 1447]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['Algorithms for the Markov Entropy Decomposition', 'Andrew J. Ferris and David Poulin', 'Département de Physique, Université de Sherbrooke, Québec, JI K 2R1, Canada', '(Dated: October 31, 2018)', 'The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for fi -', 'nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for', 'performing the required steps of the MED, principally solving a minimization problem with a preconditioned', 'arXiv:1212.1442v1 [cond-mat.stat-mech] 6 Dec 2012', "Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate", 'the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of', 'critical points and details of each phase. Although the method shares some qualitative similarities with exact-', 'diagonalization, we show the MED is both more accurate and significantly more flexible.', 'PACS numbers: 05.10.—a, 02.50.Ng, 03.67.–a, 74.40.Kb', 'I. INTRODUCTION', 'This approximation becomes exact in the case of a 1D quan-', 'tum (or classical) Markov chain [1O], and leads to an expo-', 'Although the equations governing quantum many-body', 'nential reduction of cost for exact entropy calculations when', 'systems are simple to write down, finding solutions for the', 'the global density matrix is a higher-dimensional Markov net-', 'majority of systems remains incredibly difficult. Modern', 'work state [12, 13].', 'physics finds itself in need of new tools to compute the emer-', 'The second approximation used in the MED approach is', 'gent behavior of large, many-body systems.', 'related to the N-representibility problem. Given a set of lo-', 'There has been a great variety of tools developed to tackle', 'cal but overlapping reduced density matrices { ρi }, it is a very', 'many-body problems, but in general, large 2D and 3D quan-', 'challenging problem to determine if there exists a global den.', 'tum systems remain hard to deal with. Most systems are', 'sity operator which is positive semi-definite and whose partial', 'thought to be non-integrable, so exact analytic solutions are', 'trace agrees with each ρi. This problem is QMA-hard (the', 'not usually expected. Direct numerical diagonalization can be', 'quantum analogue of NP) [14, 15], and is hopelessly diffi-', 'performed for relatively small systems — however the emer-', 'cult to enforce. Thus, the second approximation employed', 'gent behavior of a system in the thermodynamic limit may be', 'involves ignoring global consistency with a positive opera-', 'difficult to extract, especially in systems with large correlation', 'tor, while requiring local consistency on any overlapping re-', 'lengths. Monte Carlo approaches are technically exact (up to', 'gions between the ρi. At the zero-temperature limit, the MED', 'sampling error), but suffer from the so-called sign problem', 'approach becomes analogous to the variational nth-order re-', 'for fermionic, frustrated, or dynamical problems. Thus we are', 'duced density matrix approach, where positivity is enforced', 'limited to search for clever approximations to solve the ma-', 'on all reduced density matrices of size n [16–18].', 'jority of many-body problems.', 'The MED approach is an extremely flexible cluster method.', 'Over the past century, hundreds of such approximations', 'applicable to both translationally invariant systems of any di-', 'have been proposed, and we will mention just a few notable', 'mension in the thermodynamic limit, as well as finite systems', 'examples applicable to quantum lattice models. Mean-field', 'or systems without translational invariance (e.g. disordered', 'theory is simple and frequently arrives at the correct quali-', 'lattices, or harmonically trapped atoms in optical lattices).', 'tative description, but often fails when correlations are im-', 'The free energy given by MED is guaranteed to lower bound', 'portant. Density-matrix renormalisation group (DMRG) [1]', 'the true free energy, which in turn lower-bounds the ground', 'is efficient and extremely accurate at solving 1D problems,', 'state energy — thus providing a natural complement to varia-', 'but the computational cost grows exponentially with system', 'tional approaches which upper-bound the ground state energy.', 'size in two- or higher-dimensions [2, 3]. Related tensor-', 'The ability to provide a rigorous ground-state energy window', 'network techniques designed for 2D systems are still in their', 'is a powerful validation tool, creating a very compelling rea-', 'infancy [4–6]. Series-expansion methods [7] can be success-', 'son to use this approach.', 'ful, but may diverge or otherwise converge slowly, obscuring', 'In this paper we paper we present a pedagogical introduc-', 'the state in certain regimes. There exist a variety of cluster-', 'tion to MED, including numerical implementation issues and', 'based techniques, such as dynamical-mean-field theory [8]', 'applications to 2D quantum lattice models in the thermody-', 'and density-matrix embedding [9]', 'namic limit. In Sec. II. we giye a brief deriyation of the', 'Here we discuss the so-called Markov entropy decompo-', 'Markov entropy decomposition. Section III outlines a robust', 'sition (MED), recently proposed by Poulin & Hastings [10]', 'numerical strategy for optimizing the clusters that make up', '(and analogous to a slightly earlier classical algorithm [11]).', 'the decomposition. In Sec. IV we show how we can extend', 'This is a self-consistent cluster method for fi nite temperature', 'these algorithms to extract non-trivial information, such as', 'systems that takes advantage of an approximation of the (von', 'specific heat and susceptibilities. We present an application of', 'Neumann) entropy. In [10], it was shown that the entropy', 'the method to the spin-1/2 XXZ model on a 2D square lattice', 'per site can be rigorously upper bounded using only local in-', 'in Sec. V, describing how to characterize the phase diagram', 'formation — a local, reduced density matrix on N sites, say.', 'and determine critical points, before concluding in Sec. VI.'], 'rec_scores': array([0.99388635, ..., 0.99304372]), 'rec_polys': array([[[ 352,  105],
        ...,
        [ 352,  128]],

       ...,

       [[ 632, 1431],
        ...,
        [ 632, 1447]]], dtype=int16), 'rec_boxes': array([[ 352, ...,  128],
       ...,
       [ 632, ..., 1447]], dtype=int16)}}

If save_path is specified, the visualization results will be saved in that directory. The visualization output:

image/jpeg

For project integration, you can use the following code:

from paddleocr import PaddleOCR  

ocr = PaddleOCR(
    text_detection_model_name="PP-OCRv5_mobile_det",
    use_doc_orientation_classify=False, # Use use_doc_orientation_classify to enable/disable document orientation classification model
    use_doc_unwarping=False, # Use use_doc_unwarping to enable/disable document unwarping module
    use_textline_orientation=True, # Use use_textline_orientation to enable/disable textline orientation classification model
    device="gpu:0", # Use device to specify GPU for model inference
)
result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png")  
for res in result:  
    res.print()  
    res.save_to_img("output")  
    res.save_to_json("output")

The default text detection model in the pipeline is PP-OCRv5_server_det. You need to specify PP-OCRv5_mobile_det using the text_detection_model_name argument. You can also use a local model file with the text_detection_model_dir argument. For detailed usage commands and parameter descriptions, refer to the Document.

PP-StructureV3

Layout analysis extracts structured information from document images. PP-StructureV3 includes six modules:

Layout Detection Module
General OCR Pipeline
Document Image Preprocessing Pipeline （Optional）
Table Recognition Pipeline （Optional）
Seal Recognition Pipeline （Optional）
Formula Recognition Pipeline （Optional）

Run the following command to quickly test the PP-StructureV3 pipeline:

paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
    --text_detection_model_name PP-OCRv5_mobile_det \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --device gpu:0

The results will be printed in the terminal. If save_path is specified, the results will be saved there. The predicted markdown visualization:

image/jpeg

You can use the following code to test the pipeline:

from paddleocr import PPStructureV3

pipeline = PPStructureV3(
    text_detection_model_name="PP-OCRv5_mobile_det",
    use_doc_orientation_classify=False, # Use use_doc_orientation_classify to enable/disable document orientation classification model
    use_doc_unwarping=False,    # Use use_doc_unwarping to enable/disable document unwarping module
    use_textline_orientation=False, # Use use_textline_orientation to enable/disable textline orientation classification model
    device="gpu:0", # Use device to specify GPU for model inference
    )
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
    res.print() # Print the structured prediction output
    res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
    res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format

📚 Documentation

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご