ResNet50 CLIP GAP CC12M Open-Source Image Encoder - Efficient Image Feature Extraction Trained on CC12M Data

Resnet50 Clip Gap.cc12m

Developed by timm

CLIP-style image encoder based on ResNet50 architecture, trained on CC12M dataset, extracting features through Global Average Pooling (GAP)

Image Classification

Transformers

Open Source License:Apache-2.0 #CLIP feature extraction #Zero-shot classification #Multimodal pretraining

Downloads 19

Release Time : 12/26/2024

Model Overview

This model is an image feature extraction model in the timm library, using ResNet50 architecture combined with CLIP training methods, optimized for image representation learning

Model Features

CLIP-style training

Trained using contrastive learning methods similar to CLIP to enhance image representation capabilities

Global Average Pooling

Uses GAP (Global Average Pooling) instead of traditional fully connected layers, making it more suitable for feature extraction tasks

Large-scale pretraining

Pretrained on the CC12M dataset (approximately 12 million image-text pairs)

Model Capabilities

Image feature extraction

Visual representation learning

Image embedding generation

Use Cases

Computer vision

Image retrieval

Extract image features for similar image search

Multimodal learning

Serves as a visual encoder for tasks like image-text matching

Property	Details
Tags	image-feature-extraction, timm, transformers
Library Name	timm
License	apache-2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Resnet50 Clip Gap.cc12m

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 resnet50_clip_gap.cc12m

🚀 Quick Start

📄 License