Minivla Vq Libero90 Prismatic
MiniVLA is a lightweight vision-language model compatible with the Prismatic VLMs training framework, supporting multimodal tasks from image-text to text.
Downloads 31
Release Time : 12/11/2024
Model Overview
MiniVLA is a pretrained multimodal vision-language model focused on image-text to text tasks. The model is compatible with the Prismatic VLMs training framework and suitable for full fine-tuning.
Model Features
Compatible with Prismatic Training Framework
Can directly use the Prismatic VLMs project codebase for full fine-tuning
Lightweight Design
Smaller parameter scale compared to large vision-language models while maintaining excellent performance
Multimodal Capability
Capable of handling joint understanding and generation tasks involving both images and text
Model Capabilities
Image Understanding
Text Generation
Multimodal Reasoning
Visual Question Answering
Use Cases
Robotics
Visual Navigation Command Understanding
Assists robots in understanding visual scenes and generating corresponding action commands
Content Generation
Image Caption Generation
Generates natural language descriptions based on input images
Featured Recommended AI Models
Š 2025AIbase