模型概述
模型特點
模型能力
使用案例
🚀 PP-OCRv5_mobile_det
PP-OCRv5_mobile_det是PaddleOCR團隊研發的最新一代文本檢測模型PP-OCRv5_det系列中的一員。它能夠高效、準確地支持多種場景下的文本檢測,包括手寫、垂直、旋轉和彎曲文本,支持的語言有簡體中文、繁體中文、英文和日文等。其主要特點是能夠穩健處理複雜佈局、不同大小的文本和具有挑戰性的背景,適用於文檔分析、車牌識別和場景文本檢測等實際應用。關鍵的準確率指標如下:
手寫中文 | 手寫英文 | 印刷中文 | 印刷英文 | 繁體中文 | 古文 | 日文 | 通用場景 | 拼音 | 旋轉文本 | 扭曲文本 | 藝術字 | 平均 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.744 | 0.777 | 0.905 | 0.910 | 0.823 | 0.581 | 0.727 | 0.721 | 0.575 | 0.647 | 0.827 | 0.525 | 0.770 |
🚀 快速開始
📦 安裝指南
1. 安裝PaddlePaddle
請參考以下命令,使用pip安裝PaddlePaddle:
# 適用於CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# 適用於CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# 適用於CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
有關PaddlePaddle安裝的詳細信息,請參考PaddlePaddle官方網站。
2. 安裝PaddleOCR
從PyPI安裝最新版本的PaddleOCR推理包:
python -m pip install paddleocr
💻 使用示例
基礎用法
你可以通過一條命令快速體驗其功能:
paddleocr text_detection \
--model_name PP-OCRv5_mobile_det \
-i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png
你也可以將文本檢測模塊的模型推理集成到你的項目中。在運行以下代碼之前,請將示例圖像下載到本地。
from paddleocr import TextDetection
model = TextDetection(model_name="PP-OCRv5_mobile_det")
output = model.predict(input="3ul2Rq4Sk5Cn-l69D695U.png", batch_size=1)
for res in output:
res.print()
res.save_to_img(save_path="./output/")
res.save_to_json(save_path="./output/res.json")
運行後,得到的結果如下:
{'res': {'input_path': '/root/.paddlex/predict_input/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'dt_polys': array([[[ 105, 1431],
...,
[ 105, 1452]],
...,
[[ 353, 106],
...,
[ 353, 129]]], dtype=int16), 'dt_scores': [0.8306416015066644, 0.7603795581201811, ..., 0.8819806867477359]}}
可視化後的圖像如下:
有關使用命令和參數說明的詳細信息,請參考文檔。
高級用法
單個模型的能力是有限的,但由多個模型組成的管道可以提供更強的能力來解決實際場景中的難題。
PP-OCRv5
通用OCR管道用於解決文本識別任務,通過從圖像中提取文本信息並以字符串格式輸出。該管道包含5個模塊:
- 文檔圖像方向分類模塊(可選)
- 文本圖像矯正模塊(可選)
- 文本行方向分類模塊(可選)
- 文本檢測模塊
- 文本識別模塊
運行以下命令快速體驗OCR管道:
paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
--text_detection_model_name PP-OCRv5_mobile_det \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation True \
--save_path ./output \
--device gpu:0
結果將打印到終端:
{'res': {'input_path': 'printing_en/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[ 352, 105],
...,
[ 352, 128]],
...,
[[ 632, 1431],
...,
[ 632, 1447]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['Algorithms for the Markov Entropy Decomposition', 'Andrew J. Ferris and David Poulin', 'Département de Physique, Université de Sherbrooke, Québec, JI K 2R1, Canada', '(Dated: October 31, 2018)', 'The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for fi -', 'nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for', 'performing the required steps of the MED, principally solving a minimization problem with a preconditioned', 'arXiv:1212.1442v1 [cond-mat.stat-mech] 6 Dec 2012', "Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate", 'the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of', 'critical points and details of each phase. Although the method shares some qualitative similarities with exact-', 'diagonalization, we show the MED is both more accurate and significantly more flexible.', 'PACS numbers: 05.10.—a, 02.50.Ng, 03.67.–a, 74.40.Kb', 'I. INTRODUCTION', 'This approximation becomes exact in the case of a 1D quan-', 'tum (or classical) Markov chain [1O], and leads to an expo-', 'Although the equations governing quantum many-body', 'nential reduction of cost for exact entropy calculations when', 'systems are simple to write down, finding solutions for the', 'the global density matrix is a higher-dimensional Markov net-', 'majority of systems remains incredibly difficult. Modern', 'work state [12, 13].', 'physics finds itself in need of new tools to compute the emer-', 'The second approximation used in the MED approach is', 'gent behavior of large, many-body systems.', 'related to the N-representibility problem. Given a set of lo-', 'There has been a great variety of tools developed to tackle', 'cal but overlapping reduced density matrices { ρi }, it is a very', 'many-body problems, but in general, large 2D and 3D quan-', 'challenging problem to determine if there exists a global den.', 'tum systems remain hard to deal with. Most systems are', 'sity operator which is positive semi-definite and whose partial', 'thought to be non-integrable, so exact analytic solutions are', 'trace agrees with each ρi. This problem is QMA-hard (the', 'not usually expected. Direct numerical diagonalization can be', 'quantum analogue of NP) [14, 15], and is hopelessly diffi-', 'performed for relatively small systems — however the emer-', 'cult to enforce. Thus, the second approximation employed', 'gent behavior of a system in the thermodynamic limit may be', 'involves ignoring global consistency with a positive opera-', 'difficult to extract, especially in systems with large correlation', 'tor, while requiring local consistency on any overlapping re-', 'lengths. Monte Carlo approaches are technically exact (up to', 'gions between the ρi. At the zero-temperature limit, the MED', 'sampling error), but suffer from the so-called sign problem', 'approach becomes analogous to the variational nth-order re-', 'for fermionic, frustrated, or dynamical problems. Thus we are', 'duced density matrix approach, where positivity is enforced', 'limited to search for clever approximations to solve the ma-', 'on all reduced density matrices of size n [16–18].', 'jority of many-body problems.', 'The MED approach is an extremely flexible cluster method.', 'Over the past century, hundreds of such approximations', 'applicable to both translationally invariant systems of any di-', 'have been proposed, and we will mention just a few notable', 'mension in the thermodynamic limit, as well as finite systems', 'examples applicable to quantum lattice models. Mean-field', 'or systems without translational invariance (e.g. disordered', 'theory is simple and frequently arrives at the correct quali-', 'lattices, or harmonically trapped atoms in optical lattices).', 'tative description, but often fails when correlations are im-', 'The free energy given by MED is guaranteed to lower bound', 'portant. Density-matrix renormalisation group (DMRG) [1]', 'the true free energy, which in turn lower-bounds the ground', 'is efficient and extremely accurate at solving 1D problems,', 'state energy — thus providing a natural complement to varia-', 'but the computational cost grows exponentially with system', 'tional approaches which upper-bound the ground state energy.', 'size in two- or higher-dimensions [2, 3]. Related tensor-', 'The ability to provide a rigorous ground-state energy window', 'network techniques designed for 2D systems are still in their', 'is a powerful validation tool, creating a very compelling rea-', 'infancy [4–6]. Series-expansion methods [7] can be success-', 'son to use this approach.', 'ful, but may diverge or otherwise converge slowly, obscuring', 'In this paper we paper we present a pedagogical introduc-', 'the state in certain regimes. There exist a variety of cluster-', 'tion to MED, including numerical implementation issues and', 'based techniques, such as dynamical-mean-field theory [8]', 'applications to 2D quantum lattice models in the thermody-', 'and density-matrix embedding [9]', 'namic limit. In Sec. II. we giye a brief deriyation of the', 'Here we discuss the so-called Markov entropy decompo-', 'Markov entropy decomposition. Section III outlines a robust', 'sition (MED), recently proposed by Poulin & Hastings [10]', 'numerical strategy for optimizing the clusters that make up', '(and analogous to a slightly earlier classical algorithm [11]).', 'the decomposition. In Sec. IV we show how we can extend', 'This is a self-consistent cluster method for fi nite temperature', 'these algorithms to extract non-trivial information, such as', 'systems that takes advantage of an approximation of the (von', 'specific heat and susceptibilities. We present an application of', 'Neumann) entropy. In [10], it was shown that the entropy', 'the method to the spin-1/2 XXZ model on a 2D square lattice', 'per site can be rigorously upper bounded using only local in-', 'in Sec. V, describing how to characterize the phase diagram', 'formation — a local, reduced density matrix on N sites, say.', 'and determine critical points, before concluding in Sec. VI.'], 'rec_scores': array([0.99388635, ..., 0.99304372]), 'rec_polys': array([[[ 352, 105],
...,
[ 352, 128]],
...,
[[ 632, 1431],
...,
[ 632, 1447]]], dtype=int16), 'rec_boxes': array([[ 352, ..., 128],
...,
[ 632, ..., 1447]], dtype=int16)}}
如果指定了save_path
,可視化結果將保存到save_path
目錄下。可視化輸出如下:
命令行方法適用於快速體驗。對於項目集成,也只需要幾行代碼:
from paddleocr import PaddleOCR
ocr = PaddleOCR(
text_detection_model_name="PP-OCRv5_mobile_det",
use_doc_orientation_classify=False, # 使用use_doc_orientation_classify啟用/禁用文檔方向分類模型
use_doc_unwarping=False, # 使用use_doc_unwarping啟用/禁用文檔矯正模塊
use_textline_orientation=True, # 使用use_textline_orientation啟用/禁用文本行方向分類模型
device="gpu:0", # 使用device指定用於模型推理的GPU
)
result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png")
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
管道中默認使用的模型是PP-OCRv5_server_det
,因此需要通過參數text_detection_model_name
指定為PP-OCRv5_mobile_det
。你也可以通過參數text_detection_model_dir
使用本地模型文件。有關使用命令和參數說明的詳細信息,請參考文檔。
PP-StructureV3
佈局分析是一種從文檔圖像中提取結構化信息的技術。PP-StructureV3包含以下六個模塊:
- 佈局檢測模塊
- 通用OCR管道
- 文檔圖像預處理管道(可選)
- 表格識別管道(可選)
- 印章識別管道(可選)
- 公式識別管道(可選)
運行以下命令快速體驗PP-StructureV3管道:
paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
--text_detection_model_name PP-OCRv5_mobile_det \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--device gpu:0
結果將打印到終端。如果指定了save_path
,結果將保存到save_path
目錄下。預測的Markdown可視化結果如下:
只需幾行代碼就可以體驗管道的推理。以PP-StructureV3管道為例:
from paddleocr import PPStructureV3
pipeline = PPStructureV3(
text_detection_model_name="PP-OCRv5_mobile_det",
use_doc_orientation_classify=False, # 使用use_doc_orientation_classify啟用/禁用文檔方向分類模型
use_doc_unwarping=False, # 使用use_doc_unwarping啟用/禁用文檔矯正模塊
use_textline_orientation=False, # 使用use_textline_orientation啟用/禁用文本行方向分類模型
device="gpu:0", # 使用device指定用於模型推理的GPU
)
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
res.print() # 打印結構化預測輸出
res.save_to_json(save_path="output") ## 以JSON格式保存當前圖像的結構化結果
res.save_to_markdown(save_path="output") ## 以Markdown格式保存當前圖像的結果
管道中默認使用的模型是PP-OCRv5_server_det
,因此需要通過參數text_detection_model_name
指定為PP-OCRv5_mobile_det
。你也可以通過參數text_detection_model_dir
使用本地模型文件。有關使用命令和參數說明的詳細信息,請參考文檔。
📚 詳細文檔
📄 許可證
本項目採用Apache-2.0許可證。











