Minivla History2 Vq Libero90 Prismatic
MiniVLA is a compact yet high-performance vision-language-action model, compatible with Prismatic VLMs training scripts, suitable for robotics and multimodal tasks.
Downloads 22
Release Time : 12/11/2024
Model Overview
MiniVLA is a vision-language-action model that supports image-text-to-text conversion with multimodal processing capabilities. The model is compatible with the Prismatic VLMs project codebase and suitable for full fine-tuning or parameter-efficient fine-tuning via LoRA.
Model Features
Compatible with Prismatic Training Scripts
Supports native PyTorch FSDP full fine-tuning and is compatible with the Prismatic VLMs project codebase.
Parameter-Efficient Fine-Tuning
Supports parameter-efficient fine-tuning via LoRA, ideal for limited computational resources.
Multimodal Processing
Capable of processing joint image and text inputs for vision-language-action modeling.
Model Capabilities
Image-text conversion
Multimodal processing
Vision-language-action modeling
Use Cases
Robotics
Vision-Language-Action Control
Control robots to perform specific actions through image and text inputs.
Multimodal Interaction
Image Caption Generation
Generate corresponding text descriptions based on input images.
Featured Recommended AI Models
Š 2025AIbase