F

Florence 2 Large Ft Safetensors

Developed by mrhendrey
Florence-2 is an advanced visual foundation model developed by Microsoft, employing a prompt-based architecture to unify various vision and vision-language tasks
Downloads 162
Release Time : 10/8/2024

Model Overview

This model achieves multi-task processing capabilities through a sequence-to-sequence architecture, supporting tasks such as image captioning, object detection, and segmentation, trained on the large-scale FLD-5B dataset

Model Features

Unified visual task processing
Accomplishes various vision tasks through simple text prompts without requiring task-specific models
Large-scale pre-training
Trained on the FLD-5B dataset containing 1.26 million images and 5.4 billion annotations
Zero-shot transfer capability
Demonstrates excellent performance on unseen evaluation tasks

Model Capabilities

Image caption generation
Object detection
Image segmentation
Text recognition
Visual question answering
Referring expression comprehension
Region description generation

Use Cases

Computer vision
Intelligent image analysis
Automatically generates image captions and detects objects in images
COCO detection validation mAP reaches 37.5
Document processing
Recognizes text and its location in images
Supports text recognition with regions
Human-computer interaction
Visual question answering system
Answers natural language questions about image content
VQAv2 accuracy reaches 81.7
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase