TinyLlama-1.1B-v0.1 Open-Source Visual Question-Answering Model - Lightweight Design Enables Image Question-Answering Function

Tinyllava 1.1b V0.1

Developed by 0xAmey

A lightweight visual question answering model based on TinyLlama-1.1B, trained using the BakLlava codebase

Open Source License:Apache-2.0 #Lightweight Visual Question Answering #Multimodal Dialogue #Efficient Inference with Small Models

Downloads 16

Release Time : 11/1/2023

Model Overview

This is a multimodal model combining vision and language understanding, capable of answering questions based on image content

Model Features

Lightweight Architecture

Based on TinyLlama with 1.1B parameters, suitable for resource-limited environments

Multimodal Understanding

Processes both visual and linguistic information to achieve image content understanding

Open Source License

Apache-2.0 licensed, allowing both commercial and research use

Model Capabilities

Image Content Understanding

Visual Question Answering

Multimodal Reasoning

Use Cases

Content Understanding

Image Caption Generation

Generates textual descriptions based on image content

Examples show accurate recognition of anime and AI-generated image content

Educational Assistance

Visual Learning Assistant

Helps students understand image content in textbooks

🚀 TinyLava-1.1B-v0.1

This project is a visual question - answering model, which is trained based on TinyLlama and BakLlava to provide users with the ability to answer questions about images.

🚀 Quick Start

This model is a visual question - answering model trained using TinyLlama as the base model with the BakLlava repo.

✨ Features

Utilize the powerful language understanding ability of TinyLlama.
Combine with the image - related features of BakLlava to achieve visual question - answering.

📦 Installation

If you are not using Linux, do NOT proceed, see instructions for macOS and Windows.

Clone this repository and navigate to LLaVA folder

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

Install Package

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Upgrade to latest code base

git pull
pip install -e .

💻 Usage Examples

Basic Usage

Prompt for both was, "What is shown in the given image?"

Advanced Usage

Launch a controller

python -m llava.serve.controller --host 0.0.0.0 --port 10000

Launch a gradio web server.

python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload

You just launched the Gradio web interface. Now, you can open the web interface with the URL printed on the screen. You may notice that there is no model in the model list. Do not worry, as we have not launched any model worker yet. It will be automatically updated when you launch a model worker.

Launch a model worker

This is the actual worker that performs the inference on the GPU. Each worker is responsible for a single model specified in --model-path.

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ameywtf/tinyllava-1.1b-v0.1

Wait until the process finishes loading the model and you see "Uvicorn running on ...". Now, refresh your Gradio web UI, and you will see the model you just launched in the model list.

You can launch as many workers as you want, and compare between different model checkpoints in the same Gradio interface. Please keep the --controller the same, and modify the --port and --worker to a different port number for each worker.

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port <different from 40000, say 40001> --worker http://localhost:<change accordingly, i.e. 40001> --model-path <ckpt2>

If you are using an Apple device with an M1 or M2 chip, you can specify the mps device by using the --device flag: --device mps.

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご