F

Florence 2 Large Ft

Developed by microsoft
Florence-2 is an advanced visual foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of vision and vision-language tasks.
Downloads 269.44k
Release Time : 6/15/2024

Model Overview

Florence-2 is a cutting-edge visual foundation model capable of performing tasks like image captioning, object detection, and segmentation through simple text prompts. Trained on the FLD-5B dataset containing 126 million images and 5.4 billion annotations, it achieves breakthroughs in multi-task learning.

Model Features

Multi-task Learning Capability
Handles multiple vision tasks including image captioning, object detection, and segmentation with a single model.
Prompt-based Task Execution
Executes different vision tasks through simple text prompts without requiring task-specific models.
Large-scale Pretraining
Pretrained on the FLD-5B dataset containing 126 million images and 5.4 billion annotations.

Model Capabilities

Image caption generation
Object detection
Image segmentation
Text recognition
Visual question answering
Dense region description
Region proposal

Use Cases

Computer Vision
Automatic Image Tagging
Generates detailed descriptions for images, useful for content management and retrieval systems.
Achieves CIDEr score of 143.3 on COCO caption test set
Intelligent Surveillance
Real-time detection and recognition of objects and behaviors in surveillance videos.
Achieves mAP of 37.5 on COCO detection validation set
Content Understanding
Social Media Content Analysis
Automatically analyzes social media image content to extract key information.
Achieves R@1 of 84.4 on Flickr30k test set
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase