N

Nanonets OCR S GGUF

Developed by Mungert
Nanonets-OCR-s is a powerful OCR model for converting images to Markdown. It can convert documents into structured Markdown and perform intelligent content recognition and semantic marking.
Downloads 1,044
Release Time : 6/14/2025

Model Overview

Nanonets-OCR-s is an advanced OCR model designed specifically for converting documents into structured Markdown. It can not only extract text but also recognize and mark complex content such as tables, formulas, images, signatures, and watermarks, making it very suitable for downstream processing of large language models (LLMs).

Model Features

LaTeX Formula Recognition
Automatically convert mathematical formulas into correctly formatted LaTeX syntax, distinguishing between inline formulas and display formulas.
Intelligent Image Description
Use structured <img> tags to describe images in the document, making them easy to be processed by large language models.
Signature Detection and Isolation
Recognize and isolate signatures from other text and output them to the <signature> tag, suitable for legal and business documents.
Watermark Extraction
Detect and extract watermark text from the document and place it in the <watermark> tag.
Intelligent Checkbox Processing
Convert form checkboxes and radio buttons into standardized Unicode symbols (โ˜, โ˜‘, โ˜’) for consistent and reliable processing.
Complex Table Extraction
Accurately extract complex tables from the document and convert them into Markdown and HTML table formats.

Model Capabilities

Document Conversion
Text Extraction
Table Recognition
Formula Recognition
Image Description
Signature Detection
Watermark Extraction
Checkbox Processing

Use Cases

Document Processing
PDF to Markdown
Convert PDF documents into structured Markdown format, retaining the layout and content of the original document.
Generate Markdown documents that are easy to process and edit.
Table Extraction
Extract complex tables from the document and convert them into HTML or Markdown format.
Retain the structure and content of the table for subsequent processing.
Academic Research
Formula Recognition
Recognize mathematical formulas in the document and convert them into LaTeX syntax.
Facilitate the editing and typesetting of academic papers.
Business Documents
Signature Detection
Recognize and isolate the signature part in the document.
Facilitate the processing of legal and business documents.
Watermark Extraction
Detect and extract watermark text from the document.
Facilitate the copyright management and verification of documents.
Featured Recommended AI Models
ยฉ 2025AIbase