C

CLIP ViT B 32 Laion2b S34b B79k

Developed by rroset
CLIP ViT-B/32 model trained on the LAION-2B dataset based on the OpenCLIP framework, supporting zero-shot image classification and cross-modal retrieval tasks
Downloads 48
Release Time : 6/25/2024

Model Overview

This is a vision-language pretrained model capable of understanding the relationship between images and text, supporting tasks such as zero-shot image classification and image-text retrieval.

Model Features

Zero-shot learning capability
Can perform image classification on new categories without task-specific fine-tuning
Cross-modal understanding
Capable of processing both images and text, understanding semantic relationships between them
Large-scale pretraining
Trained on the LAION-2B dataset (2 billion samples), with strong generalization capabilities

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal representation learning
Image feature extraction

Use Cases

Content retrieval
Image search
Search for relevant images using text queries
Image understanding
Zero-shot classification
Classify images of new categories without training
66.6% zero-shot top-1 accuracy on ImageNet-1k
Research applications
Cross-modal research
Study the relationship between vision and language modalities
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase