đ TiC-CLIP-bestpool-oracle Model Card
This repository offers TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) with data from 2014 to 2022 using our modified OpenCLIP code. For more details, visit our GitHub repo.
⨠Features
- Time-Continual Benchmarks: Introduces the first web - scale Time - Continual (TiC) benchmarks for vision - language models, including TiC-DataComp, TiC-YFCC, and TiC-Redcaps.
- Cost - Effective Training: Demonstrates a rehearsal - based approach that reduces compute by 2.5Ã compared to retraining from scratch.
đĻ Installation
The models are compatible with the DataComp evaluation suite and our patched version of DataComp for evaluation on TiC-DataComp-Retrieval and TiC-DataCompNet. You can also use them to resume or initialize training with OpenCLIP code. Follow the instructions in our GitHub repo to create evaluation sets or DataComp for standard evaluations on 38 datasets.
đģ Usage Examples
Basic Usage
The following snippet assumes the TiC-DataComp data has been prepared and you've followed the instructions in the GitHub repo.
Training
YEAR=2016
REPO="apple/TiC-CLIP-bestpool-oracle"
huggingface-cli download $REPO checkpoints/$YEAR.pt
pushd datacomp
final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/
torchrun --nproc_per_node 8 --nnodes 1 \
train.py \
--scale "tic_medium" \
--dataset_resampled \
--data_dir $final_data_dir \
--output_dir "./results/" \
--exp_name "datacomp_medium-basic_cumulative" \
--imagenet_val $IMAGENET_VAL_PATH \
--save_frequency 1 \
--resume
popd
Evaluation
pushd datacomp
python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly
python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification
OpenCLIP Load and Inference Example
import open_clip
from huggingface_hub import hf_hub_download
filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-cumulative", filename="checkpoints/2016.pt")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename)
tokenizer = open_clip.get_tokenizer('ViT-B-16')
image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
đ Documentation
Model Details
Model Description
Keeping large foundation models up - to - date with the latest data is costly. To avoid the high cost of constant retraining, continual training is necessary. The lack of large - scale continual learning benchmarks or baselines makes this problem worse. We introduce the first web - scale Time - Continual (TiC) benchmarks for training vision - language models. Our largest dataset, TiC-DataComp, contains over 12.7B timestamped image - text pairs from 2014 - 2022. We use these benchmarks for dynamic evaluations and show that OpenAI's CLIP (trained on data up to 2020) loses â8% zero - shot accuracy on our curated retrieval task from 2021 - 2022 compared to more recently trained models in the OpenCLIP repository. We also study efficient training on time - continuous data and show that a rehearsal - based approach reduces compute by 2.5Ã.
- Developed by: Apple
- License: See LICENSE
Model Sources
- Repository: ml-tic-clip GitHub repo
- Paper: TiC-CLIP: Continual Training of CLIP Models, Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F., International Conference on Learning Representations (ICLR), 2024.
Training Details
Training Data
Refer to TiC-DataComp.
Training Procedure
Refer to Sections 2 - 3 of our TiC-CLIP paper.
đ License
This model is released under the custom-apple-license.
đ Citation
TiC-CLIP: Continual Training of CLIP Models. (ICLR 2024)
Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F..
@inproceedings{garg2024tic,
title={TiC-CLIP: Continual Training of CLIP Models},
author={Garg, Saurabh and Farajtabar, Mehrdad and Pouransari, Hadi and Vemulapalli, Raviteja and Mehta, Sachin and Tuzel, Oncel and Shankar, Vaishaal and Faghri, Fartash},
booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
year={2024},
url={https://openreview.net/forum?id=TLADT8Wrhn}
}
Property |
Details |
Model Type |
TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) |
Training Data |
apple/TiC-DataComp |
Library Name |
tic-clip |
Tags |
vision, zero-shot-image-classification |