V

Vit L 16 HTxt Recap CLIP

Developed by UCSC-VLAA
A CLIP model trained on the Recap-DataComp-1B dataset using LLaMA-3 generated captions, suitable for zero-shot image classification tasks
Downloads 538
Release Time : 6/13/2024

Model Overview

A contrastive image-text model trained on relabeled web image data, with strong zero-shot image classification capabilities

Model Features

LLaMA-3 Relabeling
Uses LLaMA-3 generated captions to relabel and train on billions of web images
Large-scale Training
Trained on the large-scale Recap-DataComp-1B dataset
Zero-shot Capability
Can be directly applied to various image classification tasks without fine-tuning

Model Capabilities

Zero-shot image classification
Image-text matching
Cross-modal feature extraction

Use Cases

Image Understanding
Image Classification
Classifies images without training
Example shows 100% accuracy in classifying 'French donut' images
Content Moderation
Inappropriate Content Detection
Identifies inappropriate content in images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase