C

CLIP ViT B 32 CommonPool.M S128m B4k

Developed by laion
Zero-shot image classification model based on CLIP architecture, supporting general vision-language tasks
Downloads 79
Release Time : 4/26/2023

Model Overview

This model is part of the OpenCLIP project, utilizing the ViT-B-32 architecture and trained via contrastive learning to achieve joint representation of images and text. It is suitable for tasks such as zero-shot image classification and cross-modal retrieval.

Model Features

Zero-shot Learning Capability
Can be directly applied to new category recognition without task-specific fine-tuning
Cross-modal Understanding
Processes both visual and textual information simultaneously to achieve image-text matching
Large-scale Pretraining
Trained on 128M samples with a batch size of 4K, offering strong generalization capabilities

Model Capabilities

Zero-shot Image Classification
Cross-modal Retrieval
Image-Text Matching
Multimodal Feature Extraction

Use Cases

Content Moderation
Inappropriate Content Detection
Detect inappropriate image content via text descriptions
E-commerce
Product Image Search
Match product images using natural language queries
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase