CLIP-ViT-B-32-CommonPool.M-s128M-b4K Open Source Model - Zero-shot Image Classification, Handle General Visual-Language Tasks

CLIP ViT B 32 CommonPool.M S128m B4k

Developed by laion

Zero-shot image classification model based on CLIP architecture, supporting general vision-language tasks

Text-to-Image Open Source License:MIT #Zero-shot Image Classification #Multimodal Contrastive Learning #General Visual Representation

Downloads 79

Release Time : 4/26/2023

Model Overview

This model is part of the OpenCLIP project, utilizing the ViT-B-32 architecture and trained via contrastive learning to achieve joint representation of images and text. It is suitable for tasks such as zero-shot image classification and cross-modal retrieval.

Model Features

Zero-shot Learning Capability

Can be directly applied to new category recognition without task-specific fine-tuning

Cross-modal Understanding

Processes both visual and textual information simultaneously to achieve image-text matching

Large-scale Pretraining

Trained on 128M samples with a batch size of 4K, offering strong generalization capabilities

Model Capabilities

Zero-shot Image Classification

Cross-modal Retrieval

Image-Text Matching

Multimodal Feature Extraction

Use Cases

Content Moderation

Inappropriate Content Detection

Detect inappropriate image content via text descriptions

E-commerce

Product Image Search

Match product images using natural language queries

Property	Details
Tags	zero - shot image classification, clip
Library Name	`open_clip`
License	MIT

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

CLIP ViT B 32 CommonPool.M S128m B4k

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Model card for CLIP-ViT-B-32-CommonPool.M-s128M-b4K

🚀 Quick Start

✨ Features

📄 License

📦 Information Table