CLIP ViT B 32 CommonPool.M.clip S128m B4k
C
CLIP ViT B 32 CommonPool.M.clip S128m B4k
Developed by laion
Zero-shot image classification model based on CLIP architecture, supporting general pooling functionality
Downloads 164
Release Time : 4/26/2023
Model Overview
This model is a vision-language model based on the CLIP architecture, capable of performing zero-shot image classification tasks. It combines a Vision Transformer (ViT-B-32) and a text encoder, trained on a large number of image-text pairs through contrastive learning.
Model Features
Zero-shot learning capability
Performs image classification tasks without task-specific fine-tuning
General pooling functionality
Supports multiple pooling strategies to enhance model adaptability across different tasks
Vision-language alignment
Aligns visual and textual representations into the same space through contrastive learning
Model Capabilities
Zero-shot image classification
Image-text matching
Cross-modal retrieval
Use Cases
Content moderation
Automatic content filtering
Automatically identifies inappropriate content based on text descriptions
E-commerce
Product image classification
Automatically classifies product images based on descriptions
Media analysis
Image captioning
Generates descriptive labels for images
Featured Recommended AI Models
Š 2025AIbase