M

Minivla Vq Bridge Prismatic

Developed by Stanford-ILIAD
MiniVLA is a more compact yet higher-performing vision-language-action model, compatible with the Prismatic VLMs project codebase.
Downloads 22
Release Time : 12/12/2024

Model Overview

MiniVLA is a multimodal pretrained model focused on vision-language-action tasks, capable of processing image-text-to-text transformations.

Model Features

Compatible with Prismatic VLMs
Compatible with the original Prismatic VLMs project codebase, facilitating full fine-tuning with native PyTorch.
Parameter-Efficient Fine-Tuning Support
Supports parameter-efficient fine-tuning via LoRA, ideal for scenarios with limited computational resources.
Multimodal Capabilities
Combines vision and language processing abilities, suitable for complex multimodal tasks.

Model Capabilities

Image-text conversion
Multimodal understanding
Vision-language-action processing

Use Cases

Robotics
Vision-Language-Action Control
Control robot actions through image and text inputs
Multimodal Applications
Image Caption Generation
Generate text descriptions based on images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase