F

Fg Clip Large

Developed by qihoo360
FG-CLIP is a fine-grained vision and text alignment model that achieves global and region-level image-text alignment through two-stage training, enhancing fine-grained visual understanding ability.
Downloads 538
Release Time : 4/29/2025

Model Overview

FG-CLIP adopts a two-stage training strategy. In the first stage, it uses global-level image-text pairs to achieve preliminary fine-grained alignment. In the second stage, it further optimizes the alignment effect by supplementing region-level descriptions, which is suitable for fine-grained vision and text alignment tasks.

Model Features

Two-stage training
Achieve more precise vision and text alignment through two-stage training at the global and region levels.
Fine-grained alignment
Capable of capturing detailed regions in images and precisely aligning them with text descriptions.
Dense feature visualization
Support the generation of similarity heatmaps for image regions to intuitively show the model's focus points.

Model Capabilities

Fine-grained image classification
Vision and text alignment
Image region feature extraction
Zero-shot image classification

Use Cases

Image understanding
Fine-grained image classification
Classify images with subtle differences, such as identifying different breeds of cats and dogs.
Able to accurately distinguish visually similar categories.
Visual search
Description-based image retrieval
Retrieve relevant images based on text descriptions.
Able to understand fine-grained descriptions and return precisely matched images.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase