F

Florence 2 Base Ft

Developed by microsoft
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of vision and vision-language tasks.
Downloads 56.78k
Release Time : 6/15/2024

Model Overview

Florence-2 is a unified vision representation model capable of performing various vision tasks such as image captioning, object detection, and segmentation through simple text prompts.

Model Features

Unified Vision Representation
Handles multiple vision tasks including image captioning, object detection, and segmentation with a single model.
Prompt-based Task Execution
Executes different tasks through simple text prompts without requiring separate models.
Large-scale Pretraining
Trained using the FLD-5B dataset containing 126 million images and 5.4 billion annotations.

Model Capabilities

Image Caption Generation
Fine-grained Image Captioning
Object Detection
Dense Region Description
Text Recognition (OCR)

Use Cases

Computer Vision
Image Caption Generation
Generates natural language descriptions for images
COCO Caption CIDEr score 140.0
Object Detection
Detects and locates objects in images
mAP 41.4
Visual Question Answering
Answers questions about image content
Accuracy 79.7%
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase