F

Florence 2 Large Ft

Developed by zhangfaen
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based paradigm to handle various vision and vision-language tasks.
Downloads 14
Release Time : 7/2/2024

Model Overview

Florence-2 is a unified visual representation model capable of performing multiple vision tasks such as image captioning, object detection, and segmentation through simple text prompts. The model is trained on the FLD-5B dataset containing 126 million images and 5.4 billion annotations via multi-task learning.

Model Features

Unified Multi-task Processing
Handles multiple vision tasks through simple text prompts without requiring specialized architectures for different tasks.
Large-scale Pretraining
Trained on the FLD-5B dataset containing 126 million images and 5.4 billion annotations.
Zero-shot Capability
Performs well on unseen tasks, reducing dependency on task-specific data.
Fine-grained Visual Understanding
Capable of generating detailed image descriptions and precisely locating objects and regions within images.

Model Capabilities

Image Caption Generation
Object Detection
Image Segmentation
Text Recognition
Visual Question Answering
Region Proposal
Dense Region Description
Phrase Grounding

Use Cases

Computer Vision
Intelligent Image Analysis
Automatically generates detailed descriptions and content analysis of images.
Achieves a CIDEr score of 135.6 on COCO caption test.
Object Detection
Identifies objects and their locations within images.
Achieves an mAP of 37.5 on COCO detection validation.
Document Processing
Document Image Understanding
Recognizes and extracts text and structure from document images.
Assistive Technology
Visual Assistance
Provides image content descriptions for visually impaired individuals.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase