L

Llava Llama 3 8b

Developed by Intel
A large multimodal model trained based on the LLaVA-v1.5 framework, using the 8-billion-parameter Meta-Llama-3-8B-Instruct as the language backbone and equipped with a CLIP-based visual encoder.
Downloads 387
Release Time : 5/8/2024

Model Overview

This model is fine-tuned for multimodal benchmark evaluations and can also be used as a multimodal chatbot.

Model Features

Multimodal Capability
Combines a visual encoder and language model to understand and generate text content related to images.
High-Performance Benchmark
Performs excellently in multiple multimodal benchmarks such as GQA, MMVP, and Pope.
Based on LLaVA-v1.5 Framework
Utilizes an improved baseline for visual instruction tuning, enhancing performance in multimodal tasks.

Model Capabilities

Image Understanding
Multimodal Dialogue
Visual Question Answering
Image Caption Generation

Use Cases

Multimodal Evaluation
Multimodal Benchmark Testing
Used to evaluate the model's performance in multimodal tasks.
Achieved high scores in benchmarks such as GQA, MMVP, and Pope.
Chatbot
Multimodal Chat
Functions as a multimodal chatbot capable of understanding and answering image-related questions.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase