S

Spatialvla 4b 224 Pt

Developed by IPEC-COMMUNITY
SpatialVLA is a spatial enhanced vision - language - action model trained on 1.1 million real robot operation segments, focusing on robot control tasks.
Downloads 13.06k
Release Time : 1/26/2025

Model Overview

A vision - language - action model based on the PaLiGemma2 architecture, capable of generating robot control actions according to visual input and language instructions.

Model Features

Spatial enhanced representation
Specifically optimizes spatial understanding ability to better handle spatial relationships in robot operation tasks.
Large - scale real - world data training
Trained on 1.1 million real robot operation segments, with strong practical operation ability.
Concise and efficient implementation
Fully implemented based on HuggingFace Transformers, easy to deploy.

Model Capabilities

Visual instruction understanding
Robot motion generation
Spatial relationship reasoning
Multimodal task processing

Use Cases

Robot control
Object grasping
Generate a sequence of actions to grasp an object according to visual input and language instructions.
Achieve zero - shot control on the WidowX robot.
New configuration adaptation
Adapt to a new robot configuration through a small amount of fine - tuning.
Successfully applied to the Franka robot.
Spatial understanding
Spatial relationship reasoning
Understand the spatial relationships between objects and generate corresponding actions.
Perform excellently in the LIBERO benchmark test.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase