C

CLIP ViT B 32 Laion2b S34b B79k

Developed by recallapp
A vision-language model trained on the LAION-2B English dataset based on the OpenCLIP framework, supporting zero-shot image classification and cross-modal retrieval
Downloads 17
Release Time : 1/12/2025

Model Overview

This model is a variant of the CLIP architecture, using a ViT-B/32 visual encoder, trained on image-text pairs through contrastive learning, enabling zero-shot image classification and cross-modal retrieval tasks without fine-tuning.

Model Features

Zero-shot learning capability
Can perform image classification on new categories without task-specific fine-tuning
Cross-modal understanding
Capable of mapping visual and textual information into a shared embedding space
Large-scale training
Trained on the LAION-2B dataset (2 billion image-text pairs)

Model Capabilities

Zero-shot image classification
Image-text matching
Cross-modal retrieval
Image feature extraction

Use Cases

Content retrieval
Text-based image search
Retrieve relevant images using natural language queries
Image classification
Zero-shot classification
Classify new categories without training
Achieves 66.6% zero-shot top-1 accuracy on ImageNet-1k
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase