Nanovlm
nanoVLM is a lightweight vision-language model (VLM) designed for efficient training and experimentation.
Downloads 187
Release Time : 5/26/2025
Model Overview
nanoVLM combines a ViT-based image encoder and a lightweight causal language model to form a compact vision-language model suitable for multimodal tasks.
Model Features
Lightweight Design
The entire model architecture and training logic consist of only about 750 lines of code, facilitating understanding and experimentation.
Compact Parameters
After combining the image encoder and the language model, there are only 222 million parameters, suitable for efficient training and deployment.
Model Capabilities
Image-Text Generation
Multimodal Understanding
Use Cases
Research Experiment
Vision-Language Model Research
Used to study the performance and efficiency of lightweight vision-language models.
Featured Recommended AI Models
Š 2025AIbase