F

Florence 2 Large Ft

Developed by andito
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle various vision and vision-language tasks.
Downloads 93
Release Time : 6/21/2024

Model Overview

Florence-2 can perform tasks such as image captioning, object detection, and segmentation through simple text prompts, leveraging the large-scale FLD-5B dataset for multi-task learning.

Model Features

Unified visual representation
Handles multiple vision tasks with a single model, reducing the need for specialized models
Prompt-driven
Switches between different task modes through simple text prompts
Large-scale pretraining
Trained on the FLD-5B dataset (126 million images, 5.4 billion annotations)

Model Capabilities

Image caption generation
Object detection
Image segmentation
Text recognition
Visual question answering
Referring expression comprehension

Use Cases

Content understanding
Automatic image tagging
Generates detailed descriptions for images
Achieves a CIDEr score of 143.3 on the COCO caption test set
Visual analysis
Object detection
Identifies objects and their locations in images
Zero-shot mAP of 37.5 on COCO detection validation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase