Minivla Libero90 Prismatic
MiniVLA is a 1-billion-parameter vision-language model compatible with the Prismatic Vision-Language Model codebase, suitable for robotics and multimodal tasks.
Downloads 127
Release Time : 12/11/2024
Model Overview
MiniVLA is an efficient vision-language model supporting image-text-to-text conversion, ideal for multimodal tasks and robotics applications. It is compatible with the Prismatic Vision-Language Model codebase for full fine-tuning.
Model Features
Prismatic-Compatible
Compatible with the Prismatic Vision-Language Model codebase, enabling full fine-tuning using native PyTorch Fully Sharded Data Parallel (FSDP).
Efficient Multimodal
Supports multimodal processing of images and text, suitable for complex vision-language tasks.
Parameter-Efficient
1-billion-parameter scale, reducing computational resource demands while maintaining performance.
Model Capabilities
Image-text conversion
Multimodal processing
Robotic vision-language tasks
Use Cases
Robotics
Vision-Language Navigation
Helps robots understand visual inputs and generate corresponding text instructions.
Multimodal Interaction
Enables robots to interact with humans through vision and language.
Multimodal Applications
Image Caption Generation
Generates detailed textual descriptions from input images.
Featured Recommended AI Models
Š 2025AIbase