L

Lava Phi

Developed by sagar007
A vision-language model based on Microsoft's Phi-1.5 architecture, combined with CLIP for image processing capabilities
Downloads 17
Release Time : 1/2/2025

Model Overview

This is a multimodal model capable of processing both image and text inputs to generate relevant text outputs.

Model Features

Multimodal Capability
Combines text and image processing abilities to understand and generate text descriptions related to images
Efficient Training
Uses QLoRA (Quantized Low-Rank Adaptation) training method with 4-bit quantization for improved efficiency
Mixed Precision Training
Employs bfloat16 for mixed precision training to enhance training efficiency

Model Capabilities

Image Understanding
Image Caption Generation
Visual Question Answering
Multimodal Dialogue

Use Cases

Image Understanding
Image Caption Generation
Generates detailed text descriptions for input images
Visual Question Answering
Image-based QA
Answers natural language questions about image content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase