R

Resnet50 Clip Gap.cc12m

Developed by timm
CLIP-style image encoder based on ResNet50 architecture, trained on CC12M dataset, extracting features through Global Average Pooling (GAP)
Downloads 19
Release Time : 12/26/2024

Model Overview

This model is an image feature extraction model in the timm library, using ResNet50 architecture combined with CLIP training methods, optimized for image representation learning

Model Features

CLIP-style training
Trained using contrastive learning methods similar to CLIP to enhance image representation capabilities
Global Average Pooling
Uses GAP (Global Average Pooling) instead of traditional fully connected layers, making it more suitable for feature extraction tasks
Large-scale pretraining
Pretrained on the CC12M dataset (approximately 12 million image-text pairs)

Model Capabilities

Image feature extraction
Visual representation learning
Image embedding generation

Use Cases

Computer vision
Image retrieval
Extract image features for similar image search
Multimodal learning
Serves as a visual encoder for tasks like image-text matching
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase