D

Dolphin

Developed by ByteDance
Dolphin is an innovative multimodal document image parsing model that adopts an 'analyze first, parse later' paradigm to handle complex document elements.
Downloads 1,620
Release Time : 5/19/2025

Model Overview

Dolphin is a multimodal model for document image parsing, capable of processing complex interwoven document elements such as text paragraphs, charts, formulas, and tables. It achieves comprehensive page-level layout analysis and efficient element-level parsing through a two-stage approach.

Model Features

Two-stage parsing method
Performs page-level layout analysis first, followed by element-level parsing, effectively handling complex document structures
Heterogeneous anchor prompts
Uses natural language prompts to control parsing tasks, improving parsing efficiency and accuracy
Parallel parsing mechanism
Lightweight architecture supports parallel parsing of multiple document elements, enhancing processing efficiency
Multimodal capability
Simultaneously processes visual and textual information, suitable for complex document understanding tasks

Model Capabilities

Document image parsing
Layout analysis
Table extraction
Optical character recognition
Formula recognition
Chart understanding
Multimodal processing

Use Cases

Document digitization
Scanned document parsing
Convert scanned PDFs or images into structured digital documents
Preserves the original document's layout and content structure
Information extraction
Table data extraction
Extract table data from document images and convert it into structured format
High-precision table structure recognition and data extraction
Formula recognition
Identify mathematical formulas in documents and convert them into editable format
Supports recognition of complex mathematical symbols and structures
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase