🚀 TinyLlama-based Visual Question Answering Model
This project is a visual question answering model trained with TinyLlama as the base model, offering solutions for image-related question answering.
🚀 Quick Start
This model is trained using TinyLlama as the base model via the BakLlava repo.
✨ Features
- Utilizes TinyLlama as the base model for visual question answering tasks.
- Supports multi - model comparison in the Gradio interface.
📦 Installation
If you are not using Linux, do NOT proceed, see instructions for macOS and Windows.
- Clone this repository and navigate to the LLaVA folder
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
- Install the Package
conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
Upgrade to the latest code base
git pull
pip install -e .
💻 Usage Examples
Basic Usage
The prompt used for both examples was "What is shown in the given image?"
Advanced Usage
Launch a controller
python -m llava.serve.controller --host 0.0.0.0 --port 10000
Launch a gradio web server
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
You've just launched the Gradio web interface. Now, you can open the web interface with the URL printed on the screen. You may notice that there is no model in the model list. Don't worry, as we haven't launched any model worker yet. It will be automatically updated when you launch a model worker.
Launch a model worker
This is the actual worker that performs the inference on the GPU. Each worker is responsible for a single model specified in --model-path
.
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ameywtf/tinyllava-1.1b-v0.1
Wait until the process finishes loading the model and you see "Uvicorn running on ...". Now, refresh your Gradio web UI, and you'll see the model you just launched in the model list.
You can launch as many workers as you want and compare between different model checkpoints in the same Gradio interface. Please keep the --controller
the same and modify the --port
and --worker
to a different port number for each worker.
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port <different from 40000, say 40001> --worker http://localhost:<change accordingly, i.e. 40001> --model-path <ckpt2>
If you are using an Apple device with an M1 or M2 chip, you can specify the mps device by using the --device
flag: --device mps
.
📄 License
This project is licensed under the Apache-2.0 license.