MoAI-7B Open-source Model - Free Deployment to Process Image and Text Inputs and Generate Text Outputs

Moai 7B

Developed by BK-Lee

MoAI is a large-scale language and vision hybrid model capable of processing both image and text inputs to generate text outputs.

Image-to-Text

Transformers

Open Source License:MIT #Multimodal Understanding #Image-Text Generation #High-Precision OCR

Downloads 183

Release Time : 3/12/2024

Model Overview

MoAI is a multimodal model that combines visual and language processing capabilities, enabling it to understand image content and generate relevant textual descriptions or answer questions.

Model Features

Multimodal Understanding

Capable of processing both image and text inputs simultaneously and understanding the relationship between them.

Hybrid Architecture

Combines the strengths of large language models and visual models.

Efficient Inference

Supports 4-bit quantization to reduce hardware requirements.

Model Capabilities

Image Understanding

Text Generation

Visual Question Answering

Image Caption Generation

Use Cases

Content Understanding & Generation

Image Caption Generation

Generate detailed descriptions for input images.

Produces natural language descriptions of image content.

Visual Question Answering

Answer natural language questions about image content.

Accurately answers questions related to the image.

🚀 MoAI model

This repository stores the model weights presented in MoAI: Mixture of All Intelligence for Large Language and Vision Models. It offers a solution for image-text-to-text tasks, enabling users to generate detailed text descriptions based on images.

🚀 Quick Start

💻 Usage Examples

Basic Usage

The simple running code is based on MoAI-Github. You only need the following seven steps:

Step [0]: Download Github Code of MoAI, install the required libraries, set the necessary environment variable (README.md explains in detail! Don't Worry!).

git clone https://github.com/ByungKwanLee/MoAI
bash install

Step [1]: Loading Image

from PIL import Image
from torchvision.transforms import Resize
from torchvision.transforms.functional import pil_to_tensor
image_path = "figures/moai_mystery.png"
image = Resize(size=(490, 490), antialias=False)(pil_to_tensor(Image.open(image_path)))

Step [2]: Instruction Prompt

prompt = "Describe this image in detail."

Step [3]: Loading MoAI

from moai.load_moai import prepare_moai
moai_model, moai_processor, seg_model, seg_processor, od_model, od_processor, sgg_model, ocr_model \
    = prepare_moai(moai_path='BK-Lee/MoAI-7B', bits=4, grad_ckpt=False, lora=False, dtype='fp16')

Step [4]: Pre-processing for MoAI

moai_inputs = moai_model.demo_process(image=image, 
                                    prompt=prompt, 
                                    processor=moai_processor,
                                    seg_model=seg_model,
                                    seg_processor=seg_processor,
                                    od_model=od_model,
                                    od_processor=od_processor,
                                    sgg_model=sgg_model,
                                    ocr_model=ocr_model,
                                    device='cuda:0')

Step [5]: Generate

import torch
with torch.inference_mode():
    generate_ids = moai_model.generate(**moai_inputs, do_sample=True, temperature=0.9, top_p=0.95, max_new_tokens=256, use_cache=True)

Step [6]: Decoding

answer = moai_processor.batch_decode(generate_ids, skip_special_tokens=True)[0].split('[U')[0]
print(answer)

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご