N

Nanollava

Developed by qnguyen3
nanoLLaVA is a 1B-parameter vision-language model specifically designed for edge devices, featuring efficient operation.
Downloads 2,851
Release Time : 4/4/2024

Model Overview

nanoLLaVA is a compact yet powerful vision-language model built upon Qwen1.5-0.5B and SigLIP visual encoder, suitable for multimodal tasks.

Model Features

Efficient Edge Computing
Designed for efficient operation on edge devices, with a small parameter size yet powerful performance.
Multimodal Capabilities
Combines visual and language understanding abilities to handle joint tasks involving images and text.
Improved Version
The nanoLLaVA-1.5 version has been released, with significantly enhanced performance.

Model Capabilities

Visual Question Answering
Image Caption Generation
Multimodal Understanding
Text Generation
Image Analysis

Use Cases

Smart Assistants
Image Content Description
Generates detailed descriptions based on user-provided images
Accurately identifies content and contextual relationships within images
Education
Scientific Question Answering
Answers science-related questions involving images
Achieves 58.97% accuracy on the ScienceQA dataset
Featured Recommended AI Models
ยฉ 2025AIbase