模型简介
模型特点
模型能力
使用案例
🚀 PP-OCRv5_mobile_rec
PP-OCRv5_mobile_rec 是 PaddleOCR 团队开发的最新一代文本行识别模型 PP-OCRv5_rec 中的一员。它旨在通过单一模型高效、准确地支持简体中文、繁体中文、英文和日文这四种主要语言的识别,以及手写、竖排文本、拼音和生僻字等复杂文本场景。其关键准确率指标如下:
手写中文 | 手写英文 | 印刷中文 | 印刷英文 | 繁体中文 | 古文 | 日文 | 通用场景 | 拼音 | 旋转文本 | 扭曲文本 | 艺术字体 | 平均准确率 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0.4166 | 0.4944 | 0.8605 | 0.8753 | 0.7199 | 0.5786 | 0.7577 | 0.5570 | 0.7703 | 0.7248 | 0.8089 | 0.5398 | 0.8015 |
注意:如果一行中的任何字符(包括标点)不正确,则整行标记为错误。这确保了在实际应用中的更高准确性。
🚀 快速开始
📦 安装指南
1. 安装 PaddlePaddle
请参考以下命令,使用 pip 安装 PaddlePaddle:
# 适用于 CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# 适用于 CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# 适用于 CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
有关 PaddlePaddle 安装的详细信息,请参考 PaddlePaddle 官方网站。
2. 安装 PaddleOCR
从 PyPI 安装最新版本的 PaddleOCR 推理包:
python -m pip install paddleocr
💻 使用示例
基础用法
你可以通过单个命令快速体验该功能:
paddleocr text_recognition \
--model_name PP-OCRv5_mobile_rec \
-i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/2PZfbirjfxA88695lRmgk.jpeg
你也可以将文本识别模块的模型推理集成到你的项目中。在运行以下代码之前,请将示例图像下载到本地机器。
from paddleocr import TextRecognition
model = TextRecognition(model_name="PP-OCRv5_mobile_rec")
output = model.predict(input="2PZfbirjfxA88695lRmgk.jpeg", batch_size=1)
for res in output:
res.print()
res.save_to_img(save_path="./output/")
res.save_to_json(save_path="./output/res.json")
运行后,得到的结果如下:
{'res': {'input_path': '/root/.paddlex/predict_input/2PZfbirjfxA88695lRmgk.jpeg', 'page_index': None, 'rec_text': 'day as a reminder of the', 'rec_score': 0.9793617129325867}}
可视化图像如下:
有关使用命令和参数描述的详细信息,请参考 文档。
管道使用
单个模型的能力是有限的。但是由多个模型组成的管道可以提供更强的能力来解决现实场景中的难题。
PP-OCRv5
通用 OCR 管道用于通过从图像中提取文本信息并以字符串格式输出,来解决文本识别任务。管道中有 5 个模块:
- 文档图像方向分类模块(可选)
- 文本图像矫正模块(可选)
- 文本行方向分类模块(可选)
- 文本检测模块
- 文本识别模块
运行单个命令以快速体验 OCR 管道:
paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
--text_recognition_model_name PP-OCRv5_mobile_rec \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation True \
--save_path ./output \
--device gpu:0
结果将打印到终端:
{
"res": {
"input_path": "printing_en/1212.1442_1.png",
"page_index": null,
"model_settings": {
"use_doc_preprocessor": true,
"use_textline_orientation": true
},
"doc_preprocessor_res": {
"input_path": null,
"page_index": null,
"model_settings": {
"use_doc_orientation_classify": false,
"use_doc_unwarping": false
},
"angle": -1
},
"dt_polys": [
[
[
352,
105
],
...,
[
352,
128
]
],
...,
[
[
632,
1431
],
...,
[
632,
1447
]
]
],
"text_det_params": {
"limit_side_len": 64,
"limit_type": "min",
"thresh": 0.3,
"max_side_limit": 4000,
"box_thresh": 0.6,
"unclip_ratio": 1.5
},
"text_type": "general",
"textline_orientation_angles": [
0,
...,
0
],
"text_rec_score_thresh": 0.0,
"rec_texts": [
"Algorithms for the Markov Entropy Decomposition",
"Andrew J. Ferris and David Poulin",
"Département de Physique, Université de Sherbrooke, Québec, JI K 2R1, Canada",
"(Dated: October 31, 2018)",
"The Markov entropy decomposition (MED) is a recently - proposed, cluster - based simulation method for fi -",
"nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for",
"performing the required steps of the MED, principally solving a minimization problem with a preconditioned",
"arXiv:1212.1442v1 [cond - mat.stat - mech] 6 Dec 2012",
"Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate",
"the power of the method with the spin - 1/2 XXZ model on the 2D square lattice, including the extraction of",
"critical points and details of each phase. Although the method shares some qualitative similarities with exact -",
"diagonalization, we show the MED is both more accurate and significantly more flexible.",
"PACS numbers: 05.10.—a, 02.50.Ng, 03.67.–a, 74.40.Kb",
"I. INTRODUCTION",
"This approximation becomes exact in the case of a 1D quan -",
"tum (or classical) Markov chain [1O], and leads to an expo -",
"Although the equations governing quantum many - body",
"nential reduction of cost for exact entropy calculations when",
"systems are simple to write down, finding solutions for the",
"the global density matrix is a higher - dimensional Markov net -",
"majority of systems remains incredibly difficult. Modern",
"work state [12, 13].",
"physics finds itself in need of new tools to compute the emer -",
"The second approximation used in the MED approach is",
"gent behavior of large, many - body systems.",
"related to the N - representibility problem. Given a set of lo -",
"There has been a great variety of tools developed to tackle",
"cal but overlapping reduced density matrices { ρi }, it is a very",
"many - body problems, but in general, large 2D and 3D quan -",
"challenging problem to determine if there exists a global den.",
"tum systems remain hard to deal with. Most systems are",
"sity operator which is positive semi - definite and whose partial",
"thought to be non - integrable, so exact analytic solutions are",
"trace agrees with each ρi. This problem is QMA - hard (the",
"not usually expected. Direct numerical diagonalization can be",
"quantum analogue of NP) [14, 15], and is hopelessly diffi -",
"performed for relatively small systems — however the emer -",
"cult to enforce. Thus, the second approximation employed",
"gent behavior of a system in the thermodynamic limit may be",
"involves ignoring global consistency with a positive opera -",
"difficult to extract, especially in systems with large correlation",
"tor, while requiring local consistency on any overlapping re -",
"lengths. Monte Carlo approaches are technically exact (up to",
"gions between the ρi. At the zero - temperature limit, the MED",
"sampling error), but suffer from the so - called sign problem",
"approach becomes analogous to the variational nth - order re -",
"for fermionic, frustrated, or dynamical problems. Thus we are",
"duced density matrix approach, where positivity is enforced",
"limited to search for clever approximations to solve the ma -",
"on all reduced density matrices of size n [16–18].",
"jority of many - body problems.",
"The MED approach is an extremely flexible cluster method.",
"Over the past century, hundreds of such approximations",
"applicable to both translationally invariant systems of any di -",
"have been proposed, and we will mention just a few notable",
"mension in the thermodynamic limit, as well as finite systems",
"examples applicable to quantum lattice models. Mean - field",
"or systems without translational invariance (e.g. disordered",
"theory is simple and frequently arrives at the correct quali -",
"lattices, or harmonically trapped atoms in optical lattices).",
"tative description, but often fails when correlations are im -",
"The free energy given by MED is guaranteed to lower bound",
"portant. Density - matrix renormalisation group (DMRG) [1]",
"the true free energy, which in turn lower - bounds the ground",
"is efficient and extremely accurate at solving 1D problems,",
"state energy — thus providing a natural complement to varia -",
"but the computational cost grows exponentially with system",
"tional approaches which upper - bound the ground state energy.",
"size in two - or higher - dimensions [2, 3]. Related tensor -",
"The ability to provide a rigorous ground - state energy window",
"network techniques designed for 2D systems are still in their",
"is a powerful validation tool, creating a very compelling rea -",
"infancy [4–6]. Series - expansion methods [7] can be success -",
"son to use this approach.",
"ful, but may diverge or otherwise converge slowly, obscuring",
"In this paper we paper we present a pedagogical introduc -",
"the state in certain regimes. There exist a variety of cluster -",
"tion to MED, including numerical implementation issues and",
"based techniques, such as dynamical - mean - field theory [8]",
"applications to 2D quantum lattice models in the thermody -",
"and density - matrix embedding [9]",
"namic limit. In Sec. II. we giye a brief deriyation of the",
"Here we discuss the so - called Markov entropy decompo -",
"Markov entropy decomposition. Section III outlines a robust",
"sition (MED), recently proposed by Poulin & Hastings [10]",
"numerical strategy for optimizing the clusters that make up",
"(and analogous to a slightly earlier classical algorithm [11]).",
"the decomposition. In Sec. IV we show how we can extend",
"This is a self - consistent cluster method for fi nite temperature",
"these algorithms to extract non - trivial information, such as",
"systems that takes advantage of an approximation of the (von",
"specific heat and susceptibilities. We present an application of",
"Neumann) entropy. In [10], it was shown that the entropy",
"the method to the spin - 1/2 XXZ model on a 2D square lattice",
"per site can be rigorously upper bounded using only local in -",
"in Sec. V, describing how to characterize the phase diagram",
"formation — a local, reduced density matrix on N sites, say.",
"and determine critical points, before concluding in Sec. VI."
],
"rec_scores": [
0.99388635,
...,
0.99304372
],
"rec_polys": [
[
[
352,
105
],
...,
[
352,
128
]
],
...,
[
[
632,
1431
],
...,
[
632,
1447
]
]
],
"rec_boxes": [
[
352,
...,
128
],
...,
[
632,
...,
1447
]
]
}
}
如果指定了 save_path
,可视化结果将保存在 save_path
下。可视化输出如下:
命令行方法适用于快速体验。对于项目集成,也只需要几行代码:
from paddleocr import PaddleOCR
ocr = PaddleOCR(
text_recognition_model_name="PP-OCRv5_mobile_rec",
use_doc_orientation_classify=False, # 使用 use_doc_orientation_classify 启用/禁用文档方向分类模型
use_doc_unwarping=False, # 使用 use_doc_unwarping 启用/禁用文档矫正模块
use_textline_orientation=True, # 使用 use_textline_orientation 启用/禁用文本行方向分类模型
device="gpu:0", # 使用 device 指定用于模型推理的 GPU
)
result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png")
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
管道中使用的默认模型是 PP-OCRv5_server_rec
,因此需要通过参数 text_recognition_model_name
指定为 PP-OCRv5_mobile_rec
。你还可以通过参数 text_recognition_model_dir
使用本地模型文件。有关使用命令和参数描述的详细信息,请参考 文档。
PP-StructureV3
布局分析是一种从文档图像中提取结构化信息的技术。PP-StructureV3 包括以下六个模块:
- 布局检测模块
- 通用 OCR 管道
- 文档图像预处理管道(可选)
- 表格识别管道(可选)
- 印章识别管道(可选)
- 公式识别管道(可选)
运行单个命令以快速体验 PP-StructureV3 管道:
paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
--text_recognition_model_name PP-OCRv5_mobile_rec \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--device gpu:0
结果将打印到终端。如果指定了 save_path
,结果将保存在 save_path
下。预测的 Markdown 可视化如下:
只需几行代码就可以体验管道的推理。以 PP-StructureV3 管道为例:
from paddleocr import PPStructureV3
pipeline = PPStructureV3(
text_recognition_model_name="PP-OCRv5_mobile_rec",
use_doc_orientation_classify=False, # 使用 use_doc_orientation_classify 启用/禁用文档方向分类模型
use_doc_unwarping=False, # 使用 use_doc_unwarping 启用/禁用文档矫正模块
use_textline_orientation=False, # 使用 use_textline_orientation 启用/禁用文本行方向分类模型
device="gpu:0", # 使用 device 指定用于模型推理的 GPU
)
output = pipeline.predict("./pp_structure_v3_demo.png")
for res in output:
res.print() # 打印结构化预测输出
res.save_to_json(save_path="output") ## 以 JSON 格式保存当前图像的结构化结果
res.save_to_markdown(save_path="output") ## 以 Markdown 格式保存当前图像的结果
管道中使用的默认模型是 PP-OCRv5_server_rec
,因此需要通过参数 text_recognition_model_name
指定为 PP-OCRv5_mobile_rec
。你还可以通过参数 text_recognition_model_dir
使用本地模型文件。有关使用命令和参数描述的详细信息,请参考 文档。
📚 详细文档
📄 许可证
本项目采用 Apache-2.0 许可证。











