M

Moe LLaVA StableLM 1.6B 4e

Developed by LanguageBind
MoE-LLaVA is a large-scale vision-language model based on a mixture of experts architecture, achieving efficient multimodal learning through sparsely activated parameters.
Downloads 125
Release Time : 1/23/2024

Model Overview

This model adopts a mixture of experts architecture, focusing on visual language understanding tasks, and can match or surpass the performance of larger-scale models in multiple benchmarks.

Model Features

Efficient Parameter Utilization
Achieves performance comparable to 7B-scale models with only 3 billion sparsely activated parameters
Outstanding Performance
Reaches LLaVA-1.5-7B level on multiple visual understanding tasks, surpassing 13B models in some tasks
Efficient Training
Can complete training within 2 days using 8 V100 GPUs

Model Capabilities

Visual language understanding
Multimodal learning
Image analysis
Text generation

Use Cases

Visual Understanding
Image Caption Generation
Generates accurate textual descriptions for input images
Excellent performance on object hallucination benchmarks
Visual Question Answering
Answers complex questions about image content
Matches or surpasses the performance of larger-scale models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase