F

Florence 2 DocVQA

Developed by HuggingFaceM4
This is a version of Microsoft's Florence-2 model fine-tuned for 1 day using the Docmatix dataset (5% of the data) with a learning rate of 1e-6
Downloads 3,096
Release Time : 6/21/2024

Model Overview

A multimodal model based on Florence-2-large-ft fine-tuning, excelling in image-text-to-text conversion tasks

Model Features

Multimodal understanding
Capable of processing combined image and text inputs to generate relevant text outputs
Efficient fine-tuning
Fine-tuned using only 5% of the Docmatix dataset with a learning rate of 1e-6
Based on Florence-2 architecture
Built upon Microsoft's powerful Florence-2 model foundation

Model Capabilities

Image-text understanding
Multimodal content generation
Visual question answering

Use Cases

Document processing
Document image understanding
Extract and understand text content from scanned document images
Content generation
Image caption generation
Generate descriptive text based on input images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase