TiC-CLIP-bestpool-cumulative Open-source Visual Language Model - Reduce Costs and Process Time Series Data

Tic CLIP Bestpool Cumulative

Developed by apple

TiC-CLIP is an improved vision-language model based on OpenCLIP, employing continual training strategies on time-series data to effectively reduce computational costs for model updates.

Text-to-Image Open Source License:Other #Continual Learning Vision-Language #Zero-shot Image Classification #Temporal Robustness

Downloads 313

Release Time : 6/5/2024

Model Overview

This model serves as a benchmark suite for continual training of vision-language models, containing timestamped image-text pair data spanning 9 years (2014-2022), supporting zero-shot image classification and cross-modal retrieval tasks.

Model Features

Temporal Continual Training

Adopts continual training strategy to avoid complete retraining, reducing computation by 2.5x compared to standard methods

Large-scale Time-series Data

Based on TiC-DataComp dataset containing 12.7 billion timestamped image-text pairs from 2014-2022

Efficient Replay Strategy

Maintains model performance by continuing training from last checkpoint and replaying old data

Model Capabilities

Zero-shot image classification

Image-text matching

Cross-modal retrieval

Continual learning

Use Cases

Computer Vision Research

Continual Learning Method Development

Researchers can use this model to accelerate development of continual learning methods

Starting from pre-trained checkpoints for continual training on subsequent yearly/monthly data

Cross-modal Applications

Image Retrieval Systems

Building time-series based image retrieval systems

Achieves 8% higher accuracy than traditional CLIP models on 2021-2022 retrieval tasks

🚀 TiC-CLIP-bestpool-cumulative

This repository contains TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) with data from 2014 to 2022 using modified OpenCLIP code.

🚀 Quick Start

This repository contains TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) with data from 2014 to 2022 using our modified OpenCLIP code. For additional information refer to our GitHub repo.

✨ Features

Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines.

We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language models: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs spanning 9 years (2014 - 2022).

We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show OpenAI's CLIP (trained on data up to 2020) loses ≈8% zero-shot accuracy on our curated retrieval task from 2021 - 2022 compared with more recently trained models in OpenCLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by 2.5× when compared to the standard practice of retraining from scratch. Code is available at this https URL.

Property	Details
Developed by	Apple
License	See LICENSE
Repository	ml-tic-clip GitHub repo
Paper	TiC-CLIP: Continual Training of CLIP Models, Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F., International Conference on Learning Representations (ICLR), 2024.

💻 Usage Examples

Basic Usage

The models are compatible with DataComp evaluation suite and our patched version of DataComp for evaluation on TiC-DataComp-Retrieval and TiC-DataCompNet. The models can also be used to resume a training or as initialization for new training using OpenCLIP code. Please follow instructions in our GitHub repo to create the evaluation sets or follow DataComp for the standard evaluations on 38 datasets.

The following snippet assumes the TiC-DataComp data has been prepared and following the instructions in the GitHub repo.

Training

YEAR=2016 # There are no models before 2016 since data from 2014-2016 were compined into one year
REPO="apple/TiC-CLIP-bestpool-cumulative"
huggingface-cli download $REPO checkpoints/$YEAR.pt

## Train Cummulative
pushd datacomp
final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/
torchrun --nproc_per_node 8 --nnodes 1 \
    train.py \
    --scale "tic_medium" \
    --dataset_resampled \
    --data_dir $final_data_dir \
    --output_dir "./results/" \
    --exp_name "datacomp_medium-basic_cumulative" \
    --imagenet_val  $IMAGENET_VAL_PATH  \
    --save_frequency 1 \
    --resume
popd

Evaluation

## Evaluate Model
# Evaluate a ViT-B/16 model on TiC/Retrieval/Yearly/$YEAR and
# TiC/DataCompNet/Yearly/$YEAR
pushd datacomp
python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly
python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification

OpenCLIP Load and Inference Example

import open_clip
from huggingface_hub import hf_hub_download
filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-cumulative", filename="checkpoints/2016.pt")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename)
tokenizer = open_clip.get_tokenizer('ViT-B-16')

image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)

🔧 Technical Details

Training Data

Please refer to TiC-DataComp.

Training Procedure

Please refer to Sections 2 - 3 of our TiC-CLIP paper.

📄 License

License: other License name: custom-apple-license License link: https://github.com/apple/ml-tic-clip/blob/main/LICENSE

📚 Documentation

Citation

TiC-CLIP: Continual Training of CLIP Models. (ICLR 2024) Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F..

@inproceedings{garg2024tic,
  title={TiC-CLIP: Continual Training of CLIP Models},
  author={Garg, Saurabh and Farajtabar, Mehrdad and Pouransari, Hadi and Vemulapalli, Raviteja and Mehta, Sachin and Tuzel, Oncel and Shankar, Vaishaal and Faghri, Fartash},
  booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
  year={2024},
  url={https://openreview.net/forum?id=TLADT8Wrhn}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご