F

Florence 2 Base

Developed by microsoft
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of vision and vision-language tasks.
Downloads 316.74k
Release Time : 6/15/2024

Model Overview

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle various vision and vision-language tasks. It can perform tasks such as image captioning, object detection, and segmentation through simple text prompts.

Model Features

Unified Multi-task Processing
Capable of performing various vision tasks such as image captioning, object detection, and segmentation through simple text prompts.
Large-scale Pretraining
Pretrained using the FLD-5B dataset containing 1.26 million images and 5.4 billion annotations.
Zero-shot Learning Capability
Excels on unseen tasks and can perform various vision tasks without additional training.

Model Capabilities

Image captioning
Object detection
Image segmentation
Text recognition
Region proposal
Dense region description
Phrase localization from description

Use Cases

Computer Vision
Image Content Description
Generate detailed descriptions for images
Achieves a CIDEr score of 133.0 on COCO captioning task
Object Detection
Detect and localize objects in images
Achieves mAP 34.7 on COCO detection task
Vision-Language Tasks
Visual Question Answering
Answer questions about image content
Achieves 81.7% accuracy on VQAv2 task
Referring Expression Comprehension
Understand and locate specific regions described in images
Achieves 93.4% accuracy on Refcoco task
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase