M

Minivla Vq Libero90 Prismatic

Developed by Stanford-ILIAD
MiniVLA is a lightweight vision-language model compatible with the Prismatic VLMs training framework, supporting multimodal tasks from image-text to text.
Downloads 31
Release Time : 12/11/2024

Model Overview

MiniVLA is a pretrained multimodal vision-language model focused on image-text to text tasks. The model is compatible with the Prismatic VLMs training framework and suitable for full fine-tuning.

Model Features

Compatible with Prismatic Training Framework
Can directly use the Prismatic VLMs project codebase for full fine-tuning
Lightweight Design
Smaller parameter scale compared to large vision-language models while maintaining excellent performance
Multimodal Capability
Capable of handling joint understanding and generation tasks involving both images and text

Model Capabilities

Image Understanding
Text Generation
Multimodal Reasoning
Visual Question Answering

Use Cases

Robotics
Visual Navigation Command Understanding
Assists robots in understanding visual scenes and generating corresponding action commands
Content Generation
Image Caption Generation
Generates natural language descriptions based on input images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase