O

Olmocr 7B 0225 Preview

Developed by FriendliAI
A document OCR model fine-tuned based on Qwen2-VL-7B-Instruct, supporting multilingual document recognition and metadata extraction
Downloads 322
Release Time : 2/28/2025

Model Overview

This model is a multimodal model optimized for document OCR tasks, capable of processing single-page document images and extracting text content along with document structure information.

Model Features

Multimodal Document Understanding
Combines visual and language model capabilities to process both image and text information simultaneously
Metadata Extraction
Capable of identifying document language, rotation correction, table/chart detection, and other structured information
Efficient Inference Support
Supports batch processing of large volumes of documents through the sglang framework

Model Capabilities

Document Image Recognition
Multilingual Text Extraction
Document Structure Analysis
Metadata Generation
Table Detection
Chart Detection

Use Cases

Academic Research
Paper Digitization
Convert academic paper PDFs into structured digital content
Extract text content and paper metadata
Enterprise Document Processing
Contract Parsing
Automatically identify key clauses and structures in contract documents
Generate structured contract data
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase