M

MQT LLaVA 7b

Developed by gordonhu
MQT-LLaVA is an open-source multimodal chatbot model based on the Transformer architecture. It is trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction data.
Downloads 349
Release Time : 5/28/2024

Model Overview

MQT-LLaVA is an open-source model for multimodal large model and chatbot research. It can handle image and text inputs and generate text outputs.

Model Features

Open-source model
Completely open-source, available for research and commercial use (subject to the LLAMA 2 license)
Multimodal processing ability
Capable of handling image and text inputs simultaneously and generating coherent text responses
Large-scale training data
More than 1 million multimodal training data, including image-text pairs and instruction data, are used

Model Capabilities

Multimodal dialogue
Visual question answering
Image understanding and description
Text generation
Instruction following

Use Cases

Academic research
Multimodal large model research
Used to explore visual-language joint representation learning
Chatbot development
Build a dialogue system that can understand image content
Educational applications
Visual-assisted learning
Help students understand complex concepts through images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase