Dolphin
D
Dolphin
Developed by ByteDance
Dolphin is an innovative multimodal document image parsing model that adopts an 'analyze first, parse later' paradigm to handle complex document elements.
Downloads 1,620
Release Time : 5/19/2025
Model Overview
Dolphin is a multimodal model for document image parsing, capable of processing complex interwoven document elements such as text paragraphs, charts, formulas, and tables. It achieves comprehensive page-level layout analysis and efficient element-level parsing through a two-stage approach.
Model Features
Two-stage parsing method
Performs page-level layout analysis first, followed by element-level parsing, effectively handling complex document structures
Heterogeneous anchor prompts
Uses natural language prompts to control parsing tasks, improving parsing efficiency and accuracy
Parallel parsing mechanism
Lightweight architecture supports parallel parsing of multiple document elements, enhancing processing efficiency
Multimodal capability
Simultaneously processes visual and textual information, suitable for complex document understanding tasks
Model Capabilities
Document image parsing
Layout analysis
Table extraction
Optical character recognition
Formula recognition
Chart understanding
Multimodal processing
Use Cases
Document digitization
Scanned document parsing
Convert scanned PDFs or images into structured digital documents
Preserves the original document's layout and content structure
Information extraction
Table data extraction
Extract table data from document images and convert it into structured format
High-precision table structure recognition and data extraction
Formula recognition
Identify mathematical formulas in documents and convert them into editable format
Supports recognition of complex mathematical symbols and structures
Featured Recommended AI Models
Š 2025AIbase