F

Florence 2 Large

Developed by lodestone-horizon
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of visual and vision-language tasks.
Downloads 14
Release Time : 6/19/2024

Model Overview

Florence-2 is a unified visual representation model capable of performing various vision tasks such as image captioning, object detection, and segmentation through simple text prompts. It leverages the large-scale FLD-5B dataset for pre-training and excels in both zero-shot and fine-tuning scenarios.

Model Features

Unified Visual Representation
Handles multiple vision tasks with a single model, eliminating the need for separate models per task.
Prompt-based Task Execution
Switches between different task modes using simple text prompts (e.g., <OD>, <CAPTION>).
Large-scale Pre-training
Pre-trained on the FLD-5B dataset containing 126 million images and 5.4 billion annotations.
Strong Zero-shot Capability
Performs exceptionally well even without task-specific training.

Model Capabilities

Image caption generation
Object detection
Image segmentation
Text recognition
Dense region description
Region proposal
Referring expression comprehension
Visual question answering

Use Cases

Computer Vision
Automatic Image Captioning
Generates descriptive text for images
CIDEr score of 135.6 on COCO caption test set
Smart Object Detection
Detects and localizes objects in images
COCO detection validation mAP 37.5 (zero-shot)
Document Processing
Text Recognition
Extracts text content from images
Human-Computer Interaction
Visual Question Answering
Answers questions about image content
VQAv2 test accuracy 81.7 (after fine-tuning)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase