C

CLIP ViT Bigg 14 Laion2b 39B B160k

Developed by laion
A vision-language model trained on the LAION-2B dataset based on the OpenCLIP framework, supporting zero-shot image classification and cross-modal retrieval
Downloads 565.80k
Release Time : 1/23/2023

Model Overview

This is a CLIP model based on the ViT-bigG/14 architecture, trained on the 2 billion English subset of LAION-5B using the OpenCLIP framework. The model can understand the semantic relationship between images and text, supporting zero-shot image classification and cross-modal retrieval tasks.

Model Features

Zero-shot Learning Capability
Can perform image classification tasks for new categories without task-specific fine-tuning
Cross-modal Understanding
Capable of simultaneously understanding the semantic relationship between images and text
Large-scale Training
Trained on the 2 billion-scale LAION-2B English dataset
High Performance
Achieves 80.1% zero-shot top-1 accuracy on ImageNet-1k

Model Capabilities

Zero-shot image classification
Image-text retrieval
Cross-modal semantic understanding
Image feature extraction

Use Cases

Image Understanding
Zero-shot Image Classification
Classify images of new categories without training
Achieves 80.1% accuracy on ImageNet-1k
Image Retrieval
Retrieve relevant images based on text descriptions
Research Applications
Multimodal Research
Used for research on joint vision-language representation learning
Model Fine-tuning Base
Serves as a pretrained model for downstream tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase