M

Minivla Libero90 Prismatic

Developed by Stanford-ILIAD
MiniVLA is a 1-billion-parameter vision-language model compatible with the Prismatic Vision-Language Model codebase, suitable for robotics and multimodal tasks.
Downloads 127
Release Time : 12/11/2024

Model Overview

MiniVLA is an efficient vision-language model supporting image-text-to-text conversion, ideal for multimodal tasks and robotics applications. It is compatible with the Prismatic Vision-Language Model codebase for full fine-tuning.

Model Features

Prismatic-Compatible
Compatible with the Prismatic Vision-Language Model codebase, enabling full fine-tuning using native PyTorch Fully Sharded Data Parallel (FSDP).
Efficient Multimodal
Supports multimodal processing of images and text, suitable for complex vision-language tasks.
Parameter-Efficient
1-billion-parameter scale, reducing computational resource demands while maintaining performance.

Model Capabilities

Image-text conversion
Multimodal processing
Robotic vision-language tasks

Use Cases

Robotics
Vision-Language Navigation
Helps robots understand visual inputs and generate corresponding text instructions.
Multimodal Interaction
Enables robots to interact with humans through vision and language.
Multimodal Applications
Image Caption Generation
Generates detailed textual descriptions from input images.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase