S

Spatialvla 4b Mix 224 Pt

Developed by IPEC-COMMUNITY
SpatialVLA is a vision-language-action model obtained by fine-tuning the base model on fractal and bridge datasets, specifically designed for robotic control tasks.
Downloads 72
Release Time : 1/26/2025

Model Overview

This model primarily converts language instructions and visual inputs into robot actions, suitable for general robot policy development.

Model Features

Vision-Language-Action Integration
Capable of processing both visual inputs and language instructions to output robot action sequences
Large-Scale Pretraining
Pretrained on 1.1 million real robot demonstration data from Open X-Embodiment and RH20T
Domain-Adaptive Fine-Tuning
Optimized and fine-tuned for specific tasks on fractal and bridge datasets
Spatial Understanding Capability
Particular emphasis on understanding and expressing spatial relationships

Model Capabilities

Vision-Language Understanding
Robot Action Generation
Spatial Relationship Reasoning
Multimodal Task Processing

Use Cases

Robot Control
Object Grasping
Generates grasping action sequences based on visual inputs and language instructions
Performs well in Google Robot tasks
Spatial Navigation
Understands spatial relationships and generates navigation paths
Achieves good results in WidowX Robot tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase