Minivla Vq Bridge Prismatic
MiniVLA is a more compact yet higher-performing vision-language-action model, compatible with the Prismatic VLMs project codebase.
Downloads 22
Release Time : 12/12/2024
Model Overview
MiniVLA is a multimodal pretrained model focused on vision-language-action tasks, capable of processing image-text-to-text transformations.
Model Features
Compatible with Prismatic VLMs
Compatible with the original Prismatic VLMs project codebase, facilitating full fine-tuning with native PyTorch.
Parameter-Efficient Fine-Tuning Support
Supports parameter-efficient fine-tuning via LoRA, ideal for scenarios with limited computational resources.
Multimodal Capabilities
Combines vision and language processing abilities, suitable for complex multimodal tasks.
Model Capabilities
Image-text conversion
Multimodal understanding
Vision-language-action processing
Use Cases
Robotics
Vision-Language-Action Control
Control robot actions through image and text inputs
Multimodal Applications
Image Caption Generation
Generate text descriptions based on images
Featured Recommended AI Models
Š 2025AIbase