TiC-CLIP-basic-oracle Open-source Vision-Language Model - Supports Temporal Continual Learning and Covers Multi-year Training Data

Tic CLIP Basic Oracle

Developed by apple

TiC-CLIP is an improved vision-language model based on OpenCLIP, focusing on continual temporal learning, with training data spanning from 2014 to 2022

Text-to-Image Open Source License:Other #Continual Learning Vision Model #Zero-shot Image Classification #Temporal Robustness

Downloads 37

Release Time : 6/5/2024

Model Overview

This model maintains synchronization with the latest data through continual learning methods, avoiding the high costs of traditional retraining, making it particularly suitable for vision-language tasks requiring temporal robustness

Model Features

Temporal Continual Learning

Utilizes memory replay methods for efficient continual training, reducing computational costs by 2.5 times compared to traditional methods

Large-scale Temporally Annotated Data

Trained on the TiC-DataComp dataset, containing 12.7 billion timestamped image-text pairs spanning 9 years

Temporal Robustness

Specially designed to address performance degradation over time, maintaining adaptability to new data

Model Capabilities

Zero-shot image classification

Cross-modal retrieval

Temporal-sensitive visual understanding

Use Cases

Research Applications

Continual Learning Method Development

Researchers can use this model as a benchmark to develop new continual learning methods

Accelerates method development process

Commercial Applications

Time-sensitive Content Understanding

Used in applications requiring understanding of time-varying content, such as news and social media analysis

Improves accuracy in understanding the latest content

🚀 TiC-CLIP-basic-oracle Model Card

This repository hosts TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, basic filtering) with data from 2014 to 2022 using our modified OpenCLIP code. For more details, check out our GitHub repo.

✨ Features

Model Details

Model Description

Keeping large foundation models updated with the latest data is extremely costly. To avoid the high costs of continuous retraining, it's crucial to train these models continuously. The lack of large-scale continual learning benchmarks or baselines worsens this problem.

We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs spanning 9 years (2014 - 2022).

We first use our benchmarks to create various dynamic evaluations to measure the temporal robustness of existing models. We find that OpenAI's CLIP (trained on data up to 2020) loses about 8% zero-shot accuracy on our curated retrieval task from 2021 - 2022 compared to more recently trained models in the OpenCLIP repository.

Then, we study how to efficiently train models on time-continuous data. We show that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by 2.5× compared to the standard practice of retraining from scratch. The code is available at this https URL.

Developed by: Apple
License: See LICENSE

Model Sources

Uses

Researchers can use TiC-CLIP pretrained models to design continual learning methods more quickly. They can start from a pretrained checkpoint and continue training on the next year or month's data.

📦 Installation

The models are compatible with the DataComp evaluation suite and our patched version of DataComp for evaluation on TiC-DataComp-Retrieval and TiC-DataCompNet. The models can also be used to resume training or as an initialization for new training using OpenCLIP code.

Please follow the instructions in our GitHub repo to create the evaluation sets or follow DataComp for standard evaluations on 38 datasets.

💻 Usage Examples

Basic Usage

Training

YEAR=2016 # There are no models before 2016 since data from 2014-2016 were compined into one year
REPO="apple/TiC-CLIP-basic-oracle"
huggingface-cli download $REPO checkpoints/$YEAR.pt

## Train
pushd datacomp
final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/
torchrun --nproc_per_node 8 --nnodes 1 \
    train.py \
    --scale "tic_medium" \
    --dataset_resampled \
    --data_dir $final_data_dir \
    --output_dir "./results/" \
    --exp_name "datacomp_medium-basic_cumulative" \
    --imagenet_val  $IMAGENET_VAL_PATH  \
    --save_frequency 1 \
    --resume
popd

Evaluation

## Evaluate Model
# Evaluate a ViT-B/16 model on TiC/Retrieval/Yearly/$YEAR and
# TiC/DataCompNet/Yearly/$YEAR
pushd datacomp
python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly
python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification

Advanced Usage

OpenCLIP Load and Inference Example

import open_clip
from huggingface_hub import hf_hub_download
filename = hf_hub_download(repo_id="apple/TiC-CLIP-basic-oracle", filename="checkpoints/2016.pt")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename)
tokenizer = open_clip.get_tokenizer('ViT-B-16')

image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)

🔧 Technical Details

Training Data

Please refer to TiC-DataComp.

Training Procedure

Please refer to Sections 2 - 3 of our TiC-CLIP paper.

📄 License

This project is under a custom-apple-license.

📚 Documentation

Citation

TiC-CLIP: Continual Training of CLIP Models. (ICLR 2024) Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F..

@inproceedings{garg2024tic,
  title={TiC-CLIP: Continual Training of CLIP Models},
  author={Garg, Saurabh and Farajtabar, Mehrdad and Pouransari, Hadi and Vemulapalli, Raviteja and Mehta, Sachin and Tuzel, Oncel and Shankar, Vaishaal and Faghri, Fartash},
  booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
  year={2024},
  url={https://openreview.net/forum?id=TLADT8Wrhn}
}

Information Table

Property	Details
Model Type	TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, basic filtering)
Training Data	TiC-DataComp-Yearly with data from 2014 to 2022
Library Name	tic-clip
Tags	vision, zero-shot-image-classification
Datasets	apple/TiC-DataComp
License	custom-apple-license
License Link	https://github.com/apple/ml-tic-clip/blob/main/LICENSE

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご