F

Fg Clip Base

Developed by qihoo360
FG-CLIP is a fine-grained visual and text alignment model that achieves global and region-level image-text alignment through two-stage training.
Downloads 692
Release Time : 5/8/2025

Model Overview

FG-CLIP focuses on fine-grained visual and text alignment, achieving more precise image-text matching capabilities through two-stage training.

Model Features

Two-stage Training
The first stage achieves global-level caption-image alignment, while the second stage supplements region-level captions to optimize alignment.
Fine-grained Alignment
Capable of handling fine-grained visual and text alignment tasks, including region-level descriptions.
Dense Feature Extraction
Supports obtaining dense features of images for more detailed visual analysis.

Model Capabilities

Zero-shot Image Classification
Image-Text Matching
Fine-grained Visual Analysis
Dense Feature Extraction

Use Cases

Image Retrieval
Image Classification
Classify images based on text descriptions
Correctly identifies images of cats in examples
Visual Analysis
Region Feature Analysis
Analyze features of specific regions in an image
Can generate region-level similarity heatmaps
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase