Olympus Open-Source Computer Vision Model - Handles 20 Tasks and Achieves Multitasking Efficiently

Olympus

Developed by Yuanze

Olympus is a general task routing system designed for computer vision tasks, capable of handling 20 different visual tasks and achieving efficient multi-task processing through task routing mechanisms.

Text-to-Image

Transformers

EnglishOpen Source License:Apache-2.0 #Vision Task Routing #Multi-task Collaboration #Instruction-Driven

Downloads 231

Release Time : 12/13/2024

Model Overview

Olympus is a vision-language model based on Mipha-3B, enabling automated processing of various computer vision tasks, including image generation, editing, 3D modeling, etc., through a task routing system.

Model Features

Multi-Task Routing

Automatically identifies and routes 20 different computer vision tasks

Efficient Processing

Achieves efficient multi-task processing through task routing mechanisms

Broad Applicability

Supports a wide range of visual tasks from image generation to 3D modeling

Model Capabilities

Image Generation

Image Editing

3D Modeling

Video Generation

Visual Question Answering

Image Captioning

Visual Reasoning

Use Cases

Creative Design

Image Generation & Editing

Generate images of specific styles and perform editing modifications

Produce images that meet requirements with adjustments to color, style, etc.

3D Model Creation

Generate high-resolution 3D models based on 2D images

Create models from flat images suitable for 3D printing or animation

Content Creation

Video Content Generation

Generate short video content based on text descriptions

Create dynamic video content that matches the descriptions

🚀 Olympus: A Universal Task Router for Computer Vision Tasks

Olympus is a universal task router designed for computer vision tasks. It offers a comprehensive solution for handling various CV tasks, with support for 20 different tasks and diverse applications. It's presented in CVPR 2025 as a highlight paper.

icon

🚀 Quick Start

🛠️ Environment Installation

To set up the environment, run the following code in the shell:

git clone https://github.com/yuanze-lin/Olympus.git
cd Olympus
conda create -n olympus python==3.10 -y
conda activate olympus
pip install -r requirements.txt

This will create the olympus environment.

Download Models & Data

We share our collected Olympus dataset as follows:

Instruction	Link
Olympus Task-wise Data	Olympus_20tasks_all
Olympus Fine-tuning Data	Olympus.json

Olympus_20tasks_all: There are 20 JSON files under the 20 individual tasks folder, each corresponding to a specific task. You can refer to the routing token definitions in our paper to identify the task associated with each JSON file, along with the chain-of-action data provided in coa.json. Each of these 21 JSON files includes both training and test data.
Olympus.json: The final fine-tuning data.

(1) Download the Olympus model:

python download_olympus.py

This will save the Olympus model under the ckpts folder.

(2) Download the Olympus data for fine-tuning:

python download_olympus_json.py

The JSON data will be saved as Olympus.json in the train_data folder. Note that Olympus.json includes llava_v1_5_mix665k.json combined with our collected data from 20 tasks.

If you want to merge the data manually, first create the jsons folder by running mkdir jsons, download all the JSON files from Olympus_20tasks_all and llava_v1_5_mix665k.json into the jsons folder, then run the merge script:

python scripts/merge_data.py

(3) Download the Mipha-3B model for fine-tuning:

python download_mipha_3b.py

This will save the Mipha-3B model under the ckpts folder.

Inference

Run the following code for inference:

model_name=Olympus
MODELDIR=ckpts/$model_name

python predict.py \
  --prompt "Generate an image of a fluffy orange cat lounging on a windowsill, \
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere. \
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching. \
In the following step, produce a high-resolution 3D model based on the modified image. \
At the next point, please show a video of a cat and a dog running on a playground." \
  --model-path $MODELDIR \
  --temperature 0 \
  --conv-mode v0

Alternatively, you can run bash predict.sh as we did.

The prediction should be like:

Input Prompt:  Generate an image of a fluffy orange cat lounging on a windowsill,
with sunlight streaming through the glass and casting soft shadows to create a cozy atmosphere.
Next, would it be possible to change the cat's color to white? This change will make it more eye-catching.
In the following step, produce a high-resolution 3D model based on the modified image.
At the next point, please show a video of a cat and a dog running on a playground.

Output:  <image_gen>a fluffy orange cat lounging on a windowsill, with sunlight streaming
through the glass and casting soft shadows to create a cozy atmosphere.</image_gen>
<image_edit>change the cat's color to white.</image_edit>
<3D_gen_image>produce a high-resolution 3D model based on the modified image.</3D_gen_image>
<video_gen>a cat and a dog running on a playground.</video_gen>

Change the --prompt to customize the input prompt as needed.

Visual Instruction Tuning

Please refer here to prepare the instruction tuning data. Especially, store the images from different datasets under the train_data folder.

Run the following code to fine-tune the model:

bash scripts/mipha/finetune.sh

Evaluation

To evaluate the model's performance on different benchmarks:

See Evaluation.md.

Please place the evaluation data under the eval folder. The evaluation scripts are placed under scripts/mipha/eval/. For example, to test the model's performance on the VQAv2 dataset, simply run:

bash scripts/mipha/eval/vqav2.sh

✨ Features

🔮 Supported Capacities (Covering 20 tasks)

Capacity

🏂 Diverse Applications

Capacity

📄 License

This project is licensed under the Apache-2.0 license.

Citation

If you find Olympus useful for your research and applications, please cite using this BibTeX:

@article{lin2024olympus,
  title={Olympus: A Universal Task Router for Computer Vision Tasks},
  author={Lin, Yuanze and Li, Yunsheng and Chen, Dongdong and Xu, Weijian and Clark, Ronald and Torr, Philip HS},
  journal={arXiv preprint arXiv:2412.09612},
  year={2024}
}

Acknowledgement

Our project is built upon the following foundations:

Mipha: An impressive open-source project for lightweight vision-language assistants
LLaVA: A powerful open-source vision-language assistant project

📣 News

[ ] Release the code for integration with task-specific models.
[x] Release the training & inference code.
[x] Release Olympus datasets.
[x] Release the model of Olympus.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご