TinyLAVa-1.1B-v0.1 Open-Source Visual Question Answering Model - Supports Image Content Understanding and Q&A Tasks

Tinyllava 1.1b V0.1

Developed by TitanML

A lightweight visual question answering model based on TinyLlama-1.1B, trained using the BakLlava codebase, supporting image content understanding and question-answering tasks.

Text-to-Image

Transformers

Open Source License:Apache-2.0 #Lightweight Visual Question Answering #Multimodal Dialogue #Low-resource Deployment

Downloads 27

Release Time : 6/13/2024

Model Overview

This is a multimodal model combining vision and language capabilities, capable of understanding image content and answering related questions. Suitable for application scenarios requiring image understanding and interactive question-answering.

Model Features

Lightweight Architecture

Based on the 1.1B-parameter TinyLlama model, reducing computational resource requirements while maintaining performance.

Multimodal Understanding

Capable of processing both image and text inputs, understanding image content, and generating relevant responses.

Open-source License

Released under the Apache 2.0 license, permitting commercial and research use.

Model Capabilities

Image content understanding

Visual question answering

Multimodal reasoning

Use Cases

Content Understanding

Image Caption Generation

Analyze input images and generate descriptive text

Can accurately identify common objects and scenes

Interactive Applications

Intelligent Customer Service

Answer user queries about product images

🚀 TinyLlama-based Visual Question Answering Model

This project is a visual question answering model trained with TinyLlama as the base model, offering solutions for image-related question answering.

🚀 Quick Start

This model is trained using TinyLlama as the base model via the BakLlava repo.

✨ Features

Utilizes TinyLlama as the base model for visual question answering tasks.
Supports multi - model comparison in the Gradio interface.

📦 Installation

If you are not using Linux, do NOT proceed, see instructions for macOS and Windows.

Clone this repository and navigate to the LLaVA folder

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

Install the Package

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Upgrade to the latest code base

git pull
pip install -e .

💻 Usage Examples

Basic Usage

The prompt used for both examples was "What is shown in the given image?"

Advanced Usage

Launch a controller

python -m llava.serve.controller --host 0.0.0.0 --port 10000

Launch a gradio web server

python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload

You've just launched the Gradio web interface. Now, you can open the web interface with the URL printed on the screen. You may notice that there is no model in the model list. Don't worry, as we haven't launched any model worker yet. It will be automatically updated when you launch a model worker.

Launch a model worker

This is the actual worker that performs the inference on the GPU. Each worker is responsible for a single model specified in --model-path.

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path ameywtf/tinyllava-1.1b-v0.1

Wait until the process finishes loading the model and you see "Uvicorn running on ...". Now, refresh your Gradio web UI, and you'll see the model you just launched in the model list.

You can launch as many workers as you want and compare between different model checkpoints in the same Gradio interface. Please keep the --controller the same and modify the --port and --worker to a different port number for each worker.

python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port <different from 40000, say 40001> --worker http://localhost:<change accordingly, i.e. 40001> --model-path <ckpt2>

If you are using an Apple device with an M1 or M2 chip, you can specify the mps device by using the --device flag: --device mps.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご