M

Minivla Wrist Vq Libero90 Prismatic

Developed by Stanford-ILIAD
MiniVLA is a vision-language-action model focused on robotics, supporting multimodal tasks from image-text to text.
Downloads 18
Release Time : 12/12/2024

Model Overview

MiniVLA is a 1-billion-parameter vision-language-action model designed for robotics, capable of processing image and text inputs to generate text outputs. The model is compatible with Prismatic VLMs training scripts and suitable for full fine-tuning.

Model Features

Prismatic Training Script Compatibility
Adopts a format compatible with the Prismatic VLMs project codebase, facilitating full fine-tuning using native PyTorch FSDP.
Multimodal Processing Capability
Capable of processing both image and text inputs to generate text outputs.
Robotics Optimization
Designed and optimized specifically for robotics applications.

Model Capabilities

Image Understanding
Text Generation
Multimodal Processing
Robot Control

Use Cases

Robotics
Vision-Language Navigation
Robot navigation based on visual and language instructions
Multimodal Interaction
Robots understanding visual and language inputs and responding accordingly
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase