Minivla Wrist Vq Libero90 Prismatic
MiniVLA is a vision-language-action model focused on robotics, supporting multimodal tasks from image-text to text.
Downloads 18
Release Time : 12/12/2024
Model Overview
MiniVLA is a 1-billion-parameter vision-language-action model designed for robotics, capable of processing image and text inputs to generate text outputs. The model is compatible with Prismatic VLMs training scripts and suitable for full fine-tuning.
Model Features
Prismatic Training Script Compatibility
Adopts a format compatible with the Prismatic VLMs project codebase, facilitating full fine-tuning using native PyTorch FSDP.
Multimodal Processing Capability
Capable of processing both image and text inputs to generate text outputs.
Robotics Optimization
Designed and optimized specifically for robotics applications.
Model Capabilities
Image Understanding
Text Generation
Multimodal Processing
Robot Control
Use Cases
Robotics
Vision-Language Navigation
Robot navigation based on visual and language instructions
Multimodal Interaction
Robots understanding visual and language inputs and responding accordingly
Featured Recommended AI Models
Š 2025AIbase