ViT-L-16-HTxt-Recap-CLIP Open-Source Model - A Practical Tool for Zero-Shot Image Classification

Vit L 16 HTxt Recap CLIP

Developed by UCSC-VLAA

A CLIP model trained on the Recap-DataComp-1B dataset using LLaMA-3 generated captions, suitable for zero-shot image classification tasks

Text-to-Image #LLaMA3 Relabeling #Zero-shot Classification #Image-Text Contrastive Model

Downloads 538

Release Time : 6/13/2024

Model Overview

A contrastive image-text model trained on relabeled web image data, with strong zero-shot image classification capabilities

Model Features

LLaMA-3 Relabeling

Uses LLaMA-3 generated captions to relabel and train on billions of web images

Large-scale Training

Trained on the large-scale Recap-DataComp-1B dataset

Zero-shot Capability

Can be directly applied to various image classification tasks without fine-tuning

Model Capabilities

Zero-shot image classification

Image-text matching

Cross-modal feature extraction

Use Cases

Image Understanding

Image Classification

Classifies images without training

Example shows 100% accuracy in classifying 'French donut' images

Content Moderation

Inappropriate Content Detection

Identifies inappropriate content in images

🚀 Model card for Recap-CLIP-ViT-L-16-Txt-Huge-2.56B

A CLIPA model trained on Recap-DataComp-1B, designed for zero - shot image classification.

🚀 Quick Start

The Recap - CLIP - ViT - L - 16 - Txt - Huge - 2.56B model is a powerful tool for zero - shot image classification. It's trained on the Recap - DataComp - 1B dataset and can be easily integrated into your projects.

✨ Features

Model Type: Contrastive Image - Text, Zero - Shot Image Classification.
Original: https://github.com/UCSC-VLAA/Recap-DataComp-1B
Dataset: https://huggingface.co/datasets/UCSC-VLAA/Recap-DataComp-1B
Papers:
- What If We Recaption Billions of Web Images with LLaMA - 3?: https://arxiv.org/abs/2406.08478

Property	Details
Model Type	Contrastive Image - Text, Zero - Shot Image Classification
Training Data	https://huggingface.co/datasets/UCSC-VLAA/Recap-DataComp-1B

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
import torch.nn.functional as F
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer

model, preprocess = create_model_from_pretrained('hf-hub:UCSC-VLAA/ViT-L-16-HTxt-Recap-CLIP')
tokenizer = get_tokenizer('hf-hub:UCSC-VLAA/ViT-L-16-HTxt-Recap-CLIP')

image = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
image = preprocess(image).unsqueeze(0)

text = tokenizer(["a diagram", "a dog", "a cat", "a beignet"], context_length=model.context_length)

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features = F.normalize(image_features, dim=-1)
    text_features = F.normalize(text_features, dim=-1)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[0., 0., 0., 1.0]]

📚 Documentation

Bias, Risks, and Limitations

This model is trained on an image - text dataset with LLaVA - 1.5 - LLaMA3 - 8B generated captions, which may still contain biases and inaccuracies inherent in the original web - crawled data. Users should be aware of the bias, risks, or limitations when using this model. Check the dataset card page for more details.

⚠️ Important Note

This model may have biases and inaccuracies due to the original web - crawled data. Check the dataset card for more details.

📄 License

This model is licensed under cc - by - 4.0.

📚 Citation

@article{li2024recaption,
      title={What If We Recaption Billions of Web Images with LLaMA-3?}, 
      author={Xianhang Li and Haoqin Tu and Mude Hui and Zeyu Wang and Bingchen Zhao and Junfei Xiao and Sucheng Ren and Jieru Mei and Qing Liu and Huangjie Zheng and Yuyin Zhou and Cihang Xie},
      journal={arXiv preprint arXiv:2406.08478},
      year={2024}
}

📞 Model Contact

zwang615@ucsc.edu

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご