M

Minivla History2 Vq Libero90 Prismatic

Developed by Stanford-ILIAD
MiniVLA is a compact yet high-performance vision-language-action model, compatible with Prismatic VLMs training scripts, suitable for robotics and multimodal tasks.
Downloads 22
Release Time : 12/11/2024

Model Overview

MiniVLA is a vision-language-action model that supports image-text-to-text conversion with multimodal processing capabilities. The model is compatible with the Prismatic VLMs project codebase and suitable for full fine-tuning or parameter-efficient fine-tuning via LoRA.

Model Features

Compatible with Prismatic Training Scripts
Supports native PyTorch FSDP full fine-tuning and is compatible with the Prismatic VLMs project codebase.
Parameter-Efficient Fine-Tuning
Supports parameter-efficient fine-tuning via LoRA, ideal for limited computational resources.
Multimodal Processing
Capable of processing joint image and text inputs for vision-language-action modeling.

Model Capabilities

Image-text conversion
Multimodal processing
Vision-language-action modeling

Use Cases

Robotics
Vision-Language-Action Control
Control robots to perform specific actions through image and text inputs.
Multimodal Interaction
Image Caption Generation
Generate corresponding text descriptions based on input images.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase