模型简介
模型特点
模型能力
使用案例
🚀 PP-OCRv5_mobile_det
PP-OCRv5_mobile_det是PaddleOCR团队研发的最新一代文本检测模型PP-OCRv5_det系列中的一员。它能够高效、准确地支持多种场景下的文本检测,包括手写、垂直、旋转和弯曲文本,支持的语言有简体中文、繁体中文、英文和日文等。其主要特点是能够稳健处理复杂布局、不同大小的文本和具有挑战性的背景,适用于文档分析、车牌识别和场景文本检测等实际应用。关键的准确率指标如下:
手写中文 | 手写英文 | 印刷中文 | 印刷英文 | 繁体中文 | 古文 | 日文 | 通用场景 | 拼音 | 旋转文本 | 扭曲文本 | 艺术字 | 平均 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.744 | 0.777 | 0.905 | 0.910 | 0.823 | 0.581 | 0.727 | 0.721 | 0.575 | 0.647 | 0.827 | 0.525 | 0.770 |
🚀 快速开始
📦 安装指南
1. 安装PaddlePaddle
请参考以下命令,使用pip安装PaddlePaddle:
# 适用于CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# 适用于CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# 适用于CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
有关PaddlePaddle安装的详细信息,请参考PaddlePaddle官方网站。
2. 安装PaddleOCR
从PyPI安装最新版本的PaddleOCR推理包:
python -m pip install paddleocr
💻 使用示例
基础用法
你可以通过一条命令快速体验其功能:
paddleocr text_detection \
--model_name PP-OCRv5_mobile_det \
-i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png
你也可以将文本检测模块的模型推理集成到你的项目中。在运行以下代码之前,请将示例图像下载到本地。
from paddleocr import TextDetection
model = TextDetection(model_name="PP-OCRv5_mobile_det")
output = model.predict(input="3ul2Rq4Sk5Cn-l69D695U.png", batch_size=1)
for res in output:
res.print()
res.save_to_img(save_path="./output/")
res.save_to_json(save_path="./output/res.json")
运行后,得到的结果如下:
{'res': {'input_path': '/root/.paddlex/predict_input/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'dt_polys': array([[[ 105, 1431],
...,
[ 105, 1452]],
...,
[[ 353, 106],
...,
[ 353, 129]]], dtype=int16), 'dt_scores': [0.8306416015066644, 0.7603795581201811, ..., 0.8819806867477359]}}
可视化后的图像如下:
有关使用命令和参数说明的详细信息,请参考文档。
高级用法
单个模型的能力是有限的,但由多个模型组成的管道可以提供更强的能力来解决实际场景中的难题。
PP-OCRv5
通用OCR管道用于解决文本识别任务,通过从图像中提取文本信息并以字符串格式输出。该管道包含5个模块:
- 文档图像方向分类模块(可选)
- 文本图像矫正模块(可选)
- 文本行方向分类模块(可选)
- 文本检测模块
- 文本识别模块
运行以下命令快速体验OCR管道:
paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
--text_detection_model_name PP-OCRv5_mobile_det \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation True \
--save_path ./output \
--device gpu:0
结果将打印到终端:
{'res': {'input_path': 'printing_en/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[ 352, 105],
...,
[ 352, 128]],
...,
[[ 632, 1431],
...,
[ 632, 1447]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['Algorithms for the Markov Entropy Decomposition', 'Andrew J. Ferris and David Poulin', 'Département de Physique, Université de Sherbrooke, Québec, JI K 2R1, Canada', '(Dated: October 31, 2018)', 'The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for fi -', 'nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for', 'performing the required steps of the MED, principally solving a minimization problem with a preconditioned', 'arXiv:1212.1442v1 [cond-mat.stat-mech] 6 Dec 2012', "Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate", 'the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of', 'critical points and details of each phase. Although the method shares some qualitative similarities with exact-', 'diagonalization, we show the MED is both more accurate and significantly more flexible.', 'PACS numbers: 05.10.—a, 02.50.Ng, 03.67.–a, 74.40.Kb', 'I. INTRODUCTION', 'This approximation becomes exact in the case of a 1D quan-', 'tum (or classical) Markov chain [1O], and leads to an expo-', 'Although the equations governing quantum many-body', 'nential reduction of cost for exact entropy calculations when', 'systems are simple to write down, finding solutions for the', 'the global density matrix is a higher-dimensional Markov net-', 'majority of systems remains incredibly difficult. Modern', 'work state [12, 13].', 'physics finds itself in need of new tools to compute the emer-', 'The second approximation used in the MED approach is', 'gent behavior of large, many-body systems.', 'related to the N-representibility problem. Given a set of lo-', 'There has been a great variety of tools developed to tackle', 'cal but overlapping reduced density matrices { ρi }, it is a very', 'many-body problems, but in general, large 2D and 3D quan-', 'challenging problem to determine if there exists a global den.', 'tum systems remain hard to deal with. Most systems are', 'sity operator which is positive semi-definite and whose partial', 'thought to be non-integrable, so exact analytic solutions are', 'trace agrees with each ρi. This problem is QMA-hard (the', 'not usually expected. Direct numerical diagonalization can be', 'quantum analogue of NP) [14, 15], and is hopelessly diffi-', 'performed for relatively small systems — however the emer-', 'cult to enforce. Thus, the second approximation employed', 'gent behavior of a system in the thermodynamic limit may be', 'involves ignoring global consistency with a positive opera-', 'difficult to extract, especially in systems with large correlation', 'tor, while requiring local consistency on any overlapping re-', 'lengths. Monte Carlo approaches are technically exact (up to', 'gions between the ρi. At the zero-temperature limit, the MED', 'sampling error), but suffer from the so-called sign problem', 'approach becomes analogous to the variational nth-order re-', 'for fermionic, frustrated, or dynamical problems. Thus we are', 'duced density matrix approach, where positivity is enforced', 'limited to search for clever approximations to solve the ma-', 'on all reduced density matrices of size n [16–18].', 'jority of many-body problems.', 'The MED approach is an extremely flexible cluster method.', 'Over the past century, hundreds of such approximations', 'applicable to both translationally invariant systems of any di-', 'have been proposed, and we will mention just a few notable', 'mension in the thermodynamic limit, as well as finite systems', 'examples applicable to quantum lattice models. Mean-field', 'or systems without translational invariance (e.g. disordered', 'theory is simple and frequently arrives at the correct quali-', 'lattices, or harmonically trapped atoms in optical lattices).', 'tative description, but often fails when correlations are im-', 'The free energy given by MED is guaranteed to lower bound', 'portant. Density-matrix renormalisation group (DMRG) [1]', 'the true free energy, which in turn lower-bounds the ground', 'is efficient and extremely accurate at solving 1D problems,', 'state energy — thus providing a natural complement to varia-', 'but the computational cost grows exponentially with system', 'tional approaches which upper-bound the ground state energy.', 'size in two- or higher-dimensions [2, 3]. Related tensor-', 'The ability to provide a rigorous ground-state energy window', 'network techniques designed for 2D systems are still in their', 'is a powerful validation tool, creating a very compelling rea-', 'infancy [4–6]. Series-expansion methods [7] can be success-', 'son to use this approach.', 'ful, but may diverge or otherwise converge slowly, obscuring', 'In this paper we paper we present a pedagogical introduc-', 'the state in certain regimes. There exist a variety of cluster-', 'tion to MED, including numerical implementation issues and', 'based techniques, such as dynamical-mean-field theory [8]', 'applications to 2D quantum lattice models in the thermody-', 'and density-matrix embedding [9]', 'namic limit. In Sec. II. we giye a brief deriyation of the', 'Here we discuss the so-called Markov entropy decompo-', 'Markov entropy decomposition. Section III outlines a robust', 'sition (MED), recently proposed by Poulin & Hastings [10]', 'numerical strategy for optimizing the clusters that make up', '(and analogous to a slightly earlier classical algorithm [11]).', 'the decomposition. In Sec. IV we show how we can extend', 'This is a self-consistent cluster method for fi nite temperature', 'these algorithms to extract non-trivial information, such as', 'systems that takes advantage of an approximation of the (von', 'specific heat and susceptibilities. We present an application of', 'Neumann) entropy. In [10], it was shown that the entropy', 'the method to the spin-1/2 XXZ model on a 2D square lattice', 'per site can be rigorously upper bounded using only local in-', 'in Sec. V, describing how to characterize the phase diagram', 'formation — a local, reduced density matrix on N sites, say.', 'and determine critical points, before concluding in Sec. VI.'], 'rec_scores': array([0.99388635, ..., 0.99304372]), 'rec_polys': array([[[ 352, 105],
...,
[ 352, 128]],
...,
[[ 632, 1431],
...,
[ 632, 1447]]], dtype=int16), 'rec_boxes': array([[ 352, ..., 128],
...,
[ 632, ..., 1447]], dtype=int16)}}
如果指定了save_path
,可视化结果将保存到save_path
目录下。可视化输出如下:
命令行方法适用于快速体验。对于项目集成,也只需要几行代码:
from paddleocr import PaddleOCR
ocr = PaddleOCR(
text_detection_model_name="PP-OCRv5_mobile_det",
use_doc_orientation_classify=False, # 使用use_doc_orientation_classify启用/禁用文档方向分类模型
use_doc_unwarping=False, # 使用use_doc_unwarping启用/禁用文档矫正模块
use_textline_orientation=True, # 使用use_textline_orientation启用/禁用文本行方向分类模型
device="gpu:0", # 使用device指定用于模型推理的GPU
)
result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png")
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
管道中默认使用的模型是PP-OCRv5_server_det
,因此需要通过参数text_detection_model_name
指定为PP-OCRv5_mobile_det
。你也可以通过参数text_detection_model_dir
使用本地模型文件。有关使用命令和参数说明的详细信息,请参考文档。
PP-StructureV3
布局分析是一种从文档图像中提取结构化信息的技术。PP-StructureV3包含以下六个模块:
- 布局检测模块
- 通用OCR管道
- 文档图像预处理管道(可选)
- 表格识别管道(可选)
- 印章识别管道(可选)
- 公式识别管道(可选)
运行以下命令快速体验PP-StructureV3管道:
paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
--text_detection_model_name PP-OCRv5_mobile_det \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--device gpu:0
结果将打印到终端。如果指定了save_path
,结果将保存到save_path
目录下。预测的Markdown可视化结果如下:
只需几行代码就可以体验管道的推理。以PP-StructureV3管道为例:
from paddleocr import PPStructureV3
pipeline = PPStructureV3(
text_detection_model_name="PP-OCRv5_mobile_det",
use_doc_orientation_classify=False, # 使用use_doc_orientation_classify启用/禁用文档方向分类模型
use_doc_unwarping=False, # 使用use_doc_unwarping启用/禁用文档矫正模块
use_textline_orientation=False, # 使用use_textline_orientation启用/禁用文本行方向分类模型
device="gpu:0", # 使用device指定用于模型推理的GPU
)
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
res.print() # 打印结构化预测输出
res.save_to_json(save_path="output") ## 以JSON格式保存当前图像的结构化结果
res.save_to_markdown(save_path="output") ## 以Markdown格式保存当前图像的结果
管道中默认使用的模型是PP-OCRv5_server_det
,因此需要通过参数text_detection_model_name
指定为PP-OCRv5_mobile_det
。你也可以通过参数text_detection_model_dir
使用本地模型文件。有关使用命令和参数说明的详细信息,请参考文档。
📚 详细文档
📄 许可证
本项目采用Apache-2.0许可证。











