TiC-CLIP-bestpool-oracle Open-source Vision-Language Model - Supports Continual Temporal Learning, Data Coverage from 2014 to 2022

Tic CLIP Bestpool Oracle

Developed by apple

TiC-CLIP is an improved vision-language model based on OpenCLIP, focusing on temporal continual learning, with training data spanning from 2014 to 2022

Text-to-Image Open Source License:Other #Temporal Continual Learning #Zero-shot Image Classification #Cross-year Data Adaptation

Downloads 44

Release Time : 6/5/2024

Model Overview

This model stays synchronized with the latest data through continual learning strategies, avoiding the high costs of traditional retraining, and is suitable for zero-shot image classification and cross-modal retrieval tasks

Model Features

Temporal Continual Learning

Employs memory replay strategy for efficient continual training, reducing computational costs by 2.5 times compared to traditional methods

Large-scale Temporal Benchmark

Trained on the TiC-DataComp dataset, containing 12.7 billion timestamped image-text pairs spanning 9 years

Temporal Robustness

Specifically designed to handle changing data distributions over time, maintaining performance on new data

Model Capabilities

Zero-shot Image Classification

Cross-modal Retrieval

Image-Text Matching

Continual Learning

Use Cases

Computer Vision

Time-sensitive Image Classification

Classify images with changing distributions over time (e.g., fashion trends, news events, etc.)

Achieves approximately 8% higher accuracy than traditional CLIP models on 2021-2022 data

Cross-modal Applications

Historical Image Retrieval

Retrieve relevant historical images based on temporal context

🚀 TiC-CLIP-bestpool-oracle Model Card

This repository offers TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) with data from 2014 to 2022 using our modified OpenCLIP code. For more details, visit our GitHub repo.

✨ Features

Time-Continual Benchmarks: Introduces the first web - scale Time - Continual (TiC) benchmarks for vision - language models, including TiC-DataComp, TiC-YFCC, and TiC-Redcaps.
Cost - Effective Training: Demonstrates a rehearsal - based approach that reduces compute by 2.5× compared to retraining from scratch.

📦 Installation

The models are compatible with the DataComp evaluation suite and our patched version of DataComp for evaluation on TiC-DataComp-Retrieval and TiC-DataCompNet. You can also use them to resume or initialize training with OpenCLIP code. Follow the instructions in our GitHub repo to create evaluation sets or DataComp for standard evaluations on 38 datasets.

💻 Usage Examples

Basic Usage

The following snippet assumes the TiC-DataComp data has been prepared and you've followed the instructions in the GitHub repo.

Training

YEAR=2016 # There are no models before 2016 since data from 2014-2016 were compined into one year
REPO="apple/TiC-CLIP-bestpool-oracle"
huggingface-cli download $REPO checkpoints/$YEAR.pt

## Train Cummulative
pushd datacomp
final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/
torchrun --nproc_per_node 8 --nnodes 1 \
    train.py \
    --scale "tic_medium" \
    --dataset_resampled \
    --data_dir $final_data_dir \
    --output_dir "./results/" \
    --exp_name "datacomp_medium-basic_cumulative" \
    --imagenet_val  $IMAGENET_VAL_PATH  \
    --save_frequency 1 \
    --resume
popd

Evaluation

## Evaluate Model
# Evaluate a ViT-B/16 model on TiC/Retrieval/Yearly/$YEAR and
# TiC/DataCompNet/Yearly/$YEAR
pushd datacomp
python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly
python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification

OpenCLIP Load and Inference Example

import open_clip
from huggingface_hub import hf_hub_download
filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-cumulative", filename="checkpoints/2016.pt")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename)
tokenizer = open_clip.get_tokenizer('ViT-B-16')

image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)

📚 Documentation

Model Details

Model Description

Keeping large foundation models up - to - date with the latest data is costly. To avoid the high cost of constant retraining, continual training is necessary. The lack of large - scale continual learning benchmarks or baselines makes this problem worse. We introduce the first web - scale Time - Continual (TiC) benchmarks for training vision - language models. Our largest dataset, TiC-DataComp, contains over 12.7B timestamped image - text pairs from 2014 - 2022. We use these benchmarks for dynamic evaluations and show that OpenAI's CLIP (trained on data up to 2020) loses ≈8% zero - shot accuracy on our curated retrieval task from 2021 - 2022 compared to more recently trained models in the OpenCLIP repository. We also study efficient training on time - continuous data and show that a rehearsal - based approach reduces compute by 2.5×.

Developed by: Apple
License: See LICENSE

Model Sources

Training Details

Training Data

Refer to TiC-DataComp.

Training Procedure

Refer to Sections 2 - 3 of our TiC-CLIP paper.

📄 License

This model is released under the custom-apple-license.

📚 Citation

TiC-CLIP: Continual Training of CLIP Models. (ICLR 2024) Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F..

@inproceedings{garg2024tic,
  title={TiC-CLIP: Continual Training of CLIP Models},
  author={Garg, Saurabh and Farajtabar, Mehrdad and Pouransari, Hadi and Vemulapalli, Raviteja and Mehta, Sachin and Tuzel, Oncel and Shankar, Vaishaal and Faghri, Fartash},
  booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
  year={2024},
  url={https://openreview.net/forum?id=TLADT8Wrhn}
}

Property	Details
Model Type	TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering)
Training Data	apple/TiC-DataComp
Library Name	tic-clip
Tags	vision, zero-shot-image-classification

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご