C

CLIP ViT B 32 Laion2b S34b B79k

Developed by laion
A vision-language model trained on the English subset of LAION-2B using the OpenCLIP framework, supporting zero-shot image classification and cross-modal retrieval
Downloads 1.1M
Release Time : 9/14/2022

Model Overview

This model is a variant of the CLIP architecture, utilizing a ViT-B/32 visual encoder, trained on a 2-billion English sample subset of LAION-5B. Primarily used by the research community to explore zero-shot image classification and cross-modal understanding tasks.

Model Features

Zero-shot learning capability
Capable of performing image classification tasks without task-specific fine-tuning
Cross-modal understanding
Able to process both visual and textual information simultaneously, enabling association between images and text
Large-scale training data
Trained on the LAION-2B dataset containing 2 billion English image-text pairs

Model Capabilities

Zero-shot image classification
Image-text matching
Cross-modal retrieval
Image feature extraction

Use Cases

Research applications
Zero-shot image classification research
Exploring the model's classification capability on unseen categories
Achieves 66.6% zero-shot top-1 accuracy on ImageNet-1k
Cross-modal understanding research
Investigating the association mechanisms between visual and language modalities
Potential applications
Image retrieval systems
Retrieving relevant images based on text descriptions
Content moderation assistance
Identifying specific content in images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase