L

Llava Llama 3 8b V1 1 Transformers

Developed by xtuner
A LLaVA model fine-tuned based on Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336, supporting image-text-to-text tasks
Downloads 454.61k
Release Time : 4/26/2024

Model Overview

This is a multimodal model capable of understanding image content and generating relevant textual descriptions or answering questions about images.

Model Features

Multimodal Understanding
Combines visual encoder and language model to understand image content and generate relevant text
High Performance
Outperforms LLaVA-v1.5-7B model on multiple benchmarks
LoRA Fine-Tuning
Uses LoRA technology to fine-tune the visual encoder, improving model performance

Model Capabilities

Image content understanding
Image question answering
Multimodal dialogue
Visual reasoning

Use Cases

Visual Question Answering
Image Content Description
Provides detailed descriptions of image content
Accurately identifies objects, scenes, and relationships in images
Visual Reasoning
Answers reasoning questions about images
Excellent performance on benchmarks like MMBench
Education
Science Question Answering
Answers science questions based on images
Achieved 72.9 on ScienceQA test
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase