đ Model Card for TiC-CLIP-bestpool-sequential
This repository offers TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) from 2014 to 2022 using modified OpenCLIP code. For more details, visit our GitHub repo.
⨠Features
- Vision and Zero-Shot Image Classification: Ideal for vision-related tasks and zero-shot image classification.
- Time-Continual Benchmarks: Introduces web-scale Time-Continual (TiC) benchmarks for vision-language model training.
- Efficient Training Approach: Demonstrates a rehearsal-based method to reduce compute costs.
đĻ Installation
The models are compatible with the DataComp evaluation suite and a patched version for TiC-DataComp-Retrieval and TiC-DataCompNet. Follow the instructions in our GitHub repo to create evaluation sets or DataComp for standard evaluations on 38 datasets.
đģ Usage Examples
Basic Usage
YEAR=2016
REPO="apple/TiC-CLIP-bestpool-sequential"
huggingface-cli download $REPO checkpoints/$YEAR.pt
pushd datacomp
final_data_dir=$TIC_DATACOMP_Y_PATH/train/$YEAR/
torchrun --nproc_per_node 8 --nnodes 1 \
train.py \
--scale "tic_medium" \
--dataset_resampled \
--data_dir $final_data_dir \
--output_dir "./results/" \
--exp_name "datacomp_medium-basic_cumulative" \
--imagenet_val $IMAGENET_VAL_PATH \
--save_frequency 1 \
--resume
popd
Advanced Usage
pushd datacomp
python ../dataset_creation/tic-datacomp/generate_tasklist.py --yaml-path tasklist.yml --sample-eval --eval-tasks retrieval/yearly,datacompnet/yearly
python evaluate.py --data_dir data/ --train_output_dir ./results --use_model "ViT-B-16 $YEAR.pt" --skip_hf --skip_db --skip_notification
OpenCLIP Load and Inference Example
import open_clip
from huggingface_hub import hf_hub_download
filename = hf_hub_download(repo_id="apple/TiC-CLIP-bestpool-sequential", filename="checkpoints/2016.pt")
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', filename)
tokenizer = open_clip.get_tokenizer('ViT-B-16')
image = preprocess(Image.open("image.png").convert('RGB')).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
đ Documentation
Model Details
Model Description
Keeping large foundation models updated with the latest data is costly. To avoid the high costs of constant retraining, continual training is essential. The lack of large-scale continual learning benchmarks or baselines exacerbates this problem. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for vision-language model training: TiC-DataComp, TiC-YFCC, and TiC-Redcaps. TiC-DataComp, our largest dataset, contains over 12.7B timestamped image-text pairs from 2014 - 2022. We use these benchmarks for dynamic evaluations to measure the temporal robustness of existing models. We show that OpenAI's CLIP (trained on data up to 2020) loses â8% zero-shot accuracy on our curated retrieval task from 2021 - 2022 compared to more recently trained models in the OpenCLIP repository. We also study efficient ways to train models on time-continuous data and demonstrate that a simple rehearsal-based approach reduces compute by 2.5Ã compared to retraining from scratch. Code is available at this https URL.
- Developed by: Apple
- License: See LICENSE
Model Sources
- Repository: ml-tic-clip GitHub repo
- Paper: TiC-CLIP: Continual Training of CLIP Models, Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F., International Conference on Learning Representations (ICLR), 2024.
Uses
Researchers can utilize TiC-CLIP pretrained models to design continual learning methods more quickly by starting from a pretrained checkpoint and training on subsequent year or month data.
Training Details
Training Data
Refer to TiC-DataComp.
Training Procedure
Refer to Sections 2 - 3 of our TiC-CLIP paper.
Citation
TiC-CLIP: Continual Training of CLIP Models. (ICLR 2024)
Garg, S., Farajtabar, M., Pouransari, H., Vemulapalli, R., Mehta, S., Tuzel, O., Shankar, V. and Faghri, F..
@inproceedings{garg2024tic,
title={TiC-CLIP: Continual Training of CLIP Models},
author={Garg, Saurabh and Farajtabar, Mehrdad and Pouransari, Hadi and Vemulapalli, Raviteja and Mehta, Sachin and Tuzel, Oncel and Shankar, Vaishaal and Faghri, Fartash},
booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
year={2024},
url={https://openreview.net/forum?id=TLADT8Wrhn}
}
đ License
This project is under the custom-apple-license.
Property |
Details |
Model Type |
TiC-CLIP models trained on TiC-DataComp-Yearly (xlarge, bestpool filtering) |
Training Data |
apple/TiC-DataComp |
Library Name |
tic-clip |
Tags |
vision, zero-shot-image-classification |