F

Florence 2 Large No Flash Attn

Developed by multimodalart
Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle diverse visual tasks through unified representation, enabling functions like image captioning and object detection.
Downloads 73.91k
Release Time : 8/29/2024

Model Overview

Florence-2 is a sequence-to-sequence vision foundation model capable of performing various vision and vision-language tasks through simple text prompts, including image captioning, object detection, segmentation, etc. The model was pre-trained on the FLD-5B dataset containing 126 million images, exhibiting strong zero-shot and fine-tuning capabilities.

Model Features

Unified Visual Representation
Handles multiple visual tasks through a single model architecture, reducing the need for specialized models
Prompt-Driven Task Execution
Switches between different task modes using simple text prompts (e.g., <OD>)
Large-Scale Pretraining
Trained on the FLD-5B dataset with 126 million images and 5.4 billion annotations
Zero-Shot Capability
Achieves excellent performance on various visual tasks without fine-tuning

Model Capabilities

Image Caption Generation
Object Detection
Image Segmentation
Text Recognition
Region Proposal Generation
Dense Region Description
Visual Question Answering
Referring Expression Comprehension

Use Cases

Computer Vision
Intelligent Image Analysis
Automatically generates image captions and identifies key objects
Achieves a CIDEr score of 135.6 on the COCO caption test set
Document Processing
Recognizes and extracts text information from images
Supports text recognition with region localization
Content Understanding
Social Media Analysis
Analyzes image content and generates tags and descriptions
E-Commerce
Automatically generates product image descriptions and attribute recognition
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase