UVDoc Open-source Model - Correct Distortion and Tilt of Text Images and Improve Text Recognition Accuracy

Uvdoc

Developed by PaddlePaddle

UVDoc is mainly used to perform geometric transformations on text images to correct problems such as distortion, tilt, and perspective distortion of documents in the images, thereby improving the accuracy of subsequent text recognition.

Text Recognition Supports Multiple LanguagesOpen Source License:Apache-2.0 #Document image rectification #Geometric distortion correction #OCR preprocessing

Downloads 8,072

Release Time : 6/6/2025

Model Overview

UVDoc is a document image rectification model that can handle geometric distortion problems in document images, such as distortion, tilt, and perspective distortion, and optimize the subsequent text recognition process.

Model Features

Geometric transformation correction

It can automatically detect and correct problems such as distortion, tilt, and perspective distortion in document images.

Integrated with PaddleOCR

It is seamlessly integrated with PaddleOCR and can be used as an OCR preprocessing step to improve recognition accuracy.

Pipeline processing

It supports being used as a preprocessing module in the PP - StructureV3 document analysis pipeline, providing an end - to - end solution.

Model Capabilities

Document image rectification

Text recognition preprocessing

Geometric distortion repair

Use Cases

Document digitization

Scanned document rectification

Automatically rectify distorted documents scanned by a scanner or taken by a mobile phone.

CER 0.179 (DocUNet benchmark dataset)

OCR preprocessing

Used as a pre - processing module in the OCR system to improve recognition accuracy.

Structured document analysis

Integration with PP - StructureV3

Used as a preprocessing step in the document analysis pipeline.

Improve the recognition accuracy of structured elements such as tables and formulas

🚀 UVDoc

UVDoc is a model that focuses on text image correction. It conducts geometric transformation on images to correct issues like document distortion, inclination, and perspective deformation, ensuring more accurate subsequent text recognition.

🚀 Quick Start

📦 Installation

1. PaddlePaddle

Refer to the following commands to install PaddlePaddle using pip:

# for CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

# for CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

# for CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/

For detailed PaddlePaddle installation instructions, refer to the PaddlePaddle official website.

2. PaddleOCR

Install the latest version of the PaddleOCR inference package from PyPI:

python -m pip install paddleocr

💻 Usage Examples

Basic Usage

You can quickly experience the functionality with a single command:

paddleocr text_image_unwarping --model_name UVDoc -i https://cdn-uploads.huggingface.co/production/uploads/63d7b8ee07cd1aa3c49a2026/SfMVKd0xnMII5KBDV6Mfz.jpeg

You can also integrate the model inference of the TextImageUnwarping module into your project. Before running the following code, download the sample image to your local machine.

from paddleocr import TextImageUnwarping

model = TextImageUnwarping(model_name="UVDoc")
output = model.predict("SfMVKd0xnMII5KBDV6Mfz.jpeg", batch_size=1)
for res in output:
    res.print()
    res.save_to_img(save_path="./output/")
    res.save_to_json(save_path="./output/res.json")

After running, the obtained result is as follows:

{'res': {'input_path': 'doc_test.jpg', 'page_index': None, 'doctr_img': '...'}}

The visualized image is as follows:

image/jpeg

For details about usage command and descriptions of parameters, refer to the Document.

Advanced Usage

The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios.

PP-StructureV3

Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules:

Layout Detection Module
General OCR Sub-pipeline
Document Image Preprocessing Sub-pipeline （Optional）
Table Recognition Sub-pipeline （Optional）
Seal Recognition Sub-pipeline （Optional）
Formula Recognition Sub-pipeline （Optional）

You can quickly experience the PP-StructureV3 pipeline with a single command.

paddleocr pp_structurev3 --use_doc_unwarping True -i https://cdn-uploads.huggingface.co/production/uploads/63d7b8ee07cd1aa3c49a2026/KP10tiSZfAjMuwZUSLtRp.png

You can experience the inference of the pipeline with just a few lines of code. Taking the PP-StructureV3 pipeline as an example:

from paddleocr import PPStructureV3

pipeline = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
output = pipeline.predict("./KP10tiSZfAjMuwZUSLtRp.png")
for res in output:
    res.print() ## Print the structured prediction output
    res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
    res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format

For details about usage command and descriptions of parameters, refer to the Document.

📚 Documentation

Property	Details
Model Type	UVDoc
CER	0.179

Note: Test data set: docunet benchmark data set.

📄 License

This project is licensed under the Apache-2.0 license.

🔗 Links

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご