đ Olympus: A Universal Task Router for Computer Vision Tasks
Olympus is a universal task router designed for computer vision tasks. It offers a comprehensive solution for handling various CV tasks, with support for 20 different tasks and diverse applications. It's presented in CVPR 2025 as a highlight paper.

đ Quick Start
đ ī¸ Environment Installation
To set up the environment, run the following code in the shell:
git clone https://github.com/yuanze-lin/Olympus.git
cd Olympus
conda create -n olympus python==3.10 -y
conda activate olympus
pip install -r requirements.txt
This will create the olympus
environment.
Download Models & Data
We share our collected Olympus dataset as follows:
Olympus_20tasks_all
: There are 20 JSON files under the 20 individual tasks
folder, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in coa.json
. Each of these 21 JSON files includes both training and test data.
Olympus.json
: The final fine-tuning data.
(1) Download the Olympus model:
python download_olympus.py
This will save the Olympus
model under the ckpts
folder.
(2) Download the Olympus data for fine-tuning:
python download_olympus_json.py
The JSON data will be saved as Olympus.json
in the train_data
folder. Note that Olympus.json
includes llava_v1_5_mix665k.json
combined with our collected data from 20 tasks.
If you want to merge the data manually, first create the jsons
folder by running mkdir jsons
, download all the JSON files from Olympus_20tasks_all and llava_v1_5_mix665k.json into the jsons
folder, then run the merge script:
python scripts/merge_data.py
(3) Download the Mipha-3B model for fine-tuning:
python download_mipha_3b.py
This will save the Mipha-3B
model under the ckpts
folder.
Inference
Run the following code for inference:
model_name=Olympus
MODELDIR=ckpts/$model_name
python predict.py \
--prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
In the following step, produce a high-resolution 3D model based on the modified image. \
At the next point, please show a video of a cat and a dog running on a playground." \
--model-path $MODELDIR \
--temperature 0 \
--conv-mode v0
Alternatively, you can run bash predict.sh
as we did.
The prediction should be like:
Input Prompt: Generate an image of a fluffy orange cat lounging on a windowsill,
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
In the following step, produce a high-resolution 3D model based on the modified image.
At the next point, please show a video of a cat and a dog running on a playground.
Output: <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
<image_edit>change the cat's color to white.</image_edit>
<3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
<video_gen>a cat and a dog running on a playground.</video_gen>
Change the --prompt
to customize the input prompt as needed.
Visual Instruction Tuning
Please refer here to prepare the instruction tuning data. Especially, store the images from different datasets under the train_data
folder.
Run the following code to fine-tune the model:
bash scripts/mipha/finetune.sh
Evaluation
To evaluate the model's performance on different benchmarks:
See Evaluation.md.
Please place the evaluation data under the eval
folder. The evaluation scripts are placed under scripts/mipha/eval/
.
For example, to test the model's performance on the VQAv2 dataset, simply run:
bash scripts/mipha/eval/vqav2.sh
⨠Features
đŽ Supported Capacities (Covering 20 tasks)
đ Diverse Applications
đ License
This project is licensed under the Apache-2.0 license.
Citation
If you find Olympus useful for your research and applications, please cite using this BibTeX:
@article{lin2024olympus,
title={Olympus: A Universal Task Router for Computer Vision Tasks},
author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
journal={arXiv preprint arXiv:2412.09612},
year={2024}
}
Acknowledgement
Our project is built upon the following foundations:
- Mipha: An impressive open-source project for lightweight vision-language assistants
- LLaVA: A powerful open-source vision-language assistant project
đŖ News
- [ ] Release the code for integration with task-specific models.
- [x] Release the training & inference code.
- [x] Release Olympus datasets.
- [x] Release the model of Olympus.