ShareGPT4V-7B Open-Source Multimodal Chatbot - Free to Use and Unlock a New Experience of Cross-Modal Communication

Sharegpt4v 7B

Developed by Lin-Chen

ShareGPT4V-7B is an open-source multimodal chatbot model trained using GPT4-Vision-assisted data and LLaVA instruction fine-tuning data.

Downloads 530

Release Time : 11/20/2023

Model Overview

This model combines the CLP vision tower with LLaMA/Vicuna, focusing on research in large multimodal models and chatbot applications.

Multimodal Understanding

Capable of processing both image and text inputs to understand visual and textual content.

High-Quality Training Data

Utilizes 1.2 million high-quality image-text pairs and 100,000 GPT4-Vision-generated image-text pairs.

Open-Source and Extensible

Fully open-source and can be seamlessly loaded within the LLaVA codebase.

Image Understanding

Multimodal Dialogue

Image-Text Generation

Visual Question Answering

Research

Multimodal Model Research

Used for research at the intersection of computer vision and natural language processing.

Application Development

Intelligent Chatbot

Develop dialogue systems capable of understanding image content.

Property	Details
Model Type	ShareGPT4V - 7B is an open - source chatbot trained by fine - tuning CLP vision tower and LLaMA/Vicuna on GPT4 - Vision - assisted ShareGPT4V data and LLaVA instruction - tuning data.
Model Date	ShareGPT4V - 7B was trained in Nov 2023.
Paper or Resources	[Project] [Paper] [Code]

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base