F

Florence 2 Large

Developed by Binaryy
Florence-2 is an advanced vision foundation model developed by Microsoft, using a prompt-based approach to handle a wide range of vision and vision-language tasks.
Downloads 24
Release Time : 6/27/2024

Model Overview

Florence-2 is a unified visual representation model capable of performing various vision tasks such as image captioning, object detection, and segmentation through simple text prompts. It is trained on the FLD-5B dataset containing 126 million images and 5.4 billion annotations, excelling in both zero-shot and fine-tuned scenarios.

Model Features

Unified Visual Representation
Handles multiple vision tasks with a single model, reducing the need for specialized models.
Prompt-based Task Execution
Switches between different task modes with simple text prompts.
Large-scale Pretraining
Trained on the FLD-5B dataset containing 126 million images and 5.4 billion annotations.
Zero-shot Capability
Performs well on tasks it hasn't been specifically trained for.

Model Capabilities

Image caption generation
Object detection
Image segmentation
Text recognition
Region proposal
Dense region captioning
Visual question answering
Referring expression comprehension

Use Cases

Computer Vision
Intelligent Image Analysis
Automatically generates image captions and detects objects in images.
Achieved a CIDEr score of 135.6 on the COCO caption test.
Document Processing
Identifies and extracts text information from images.
Assistive Technology
Visual Assistance
Describes image content for visually impaired individuals.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase