F

Florence 2 Base Ft

Developed by lodestones
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of vision and vision-language tasks.
Downloads 14
Release Time : 6/19/2024

Model Overview

Florence-2 is a multi-task vision foundation model capable of performing tasks such as image captioning, object detection, and segmentation through simple text prompts. It leverages the FLD-5B dataset containing 126 million images and 5.4 billion annotations for multi-task learning.

Model Features

Multi-task Unified Model
Capable of performing various vision tasks through simple text prompts without the need for separate model training for each task.
Large-scale Pre-training
Pre-trained using the FLD-5B dataset containing 126 million images and 5.4 billion annotations.
Zero-shot Capability
Performs excellently even without training data for evaluation tasks.

Model Capabilities

Image Caption Generation
Object Detection
Image Segmentation
Text Recognition
Visual Question Answering
Region Proposal
Dense Region Description
Phrase Localization from Descriptions

Use Cases

Computer Vision
Intelligent Image Analysis
Automatically generates image captions and identifies objects within images.
Achieved a CIDEr score of 133.0 on the COCO caption test.
Object Detection
Detects objects in images and locates their positions.
Achieved an mAP of 34.7 on the COCO detection validation.
Vision-Language Understanding
Visual Question Answering
Answers natural language questions about image content.
Achieved an accuracy of 79.7 on the VQAv2 test.
Referring Expression Comprehension
Locates specific regions in images based on natural language descriptions.
Achieved an accuracy of 92.6 on the RefCOCO validation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase