S

Spatialvla 4b 224 Sft Bridge

Developed by IPEC-COMMUNITY
This model is a vision-language-action model fine-tuned on the bridge dataset based on the SpatialVLA model, specifically designed for the Simpler-env benchmark.
Downloads 1,066
Release Time : 3/16/2025

Model Overview

SpatialVLA is a vision-language-action model capable of generating robot motion instructions based on image and text inputs.

Model Features

Vision-Language-Action Integration
Capable of processing both visual and language inputs to output robot motion instructions.
Trained on Large-Scale Robot Data
Pre-trained using Open X-Embodiment and RH20T datasets.
Spatial Understanding Capability
Specifically optimized for understanding and expressing spatial relationships.
Easy Deployment
Fully based on HuggingFace Transformers, making deployment straightforward.

Model Capabilities

Vision-Language Understanding
Robot Motion Generation
Spatial Relationship Reasoning
Multimodal Task Processing

Use Cases

Robot Control
Object Grasping
Generates motion sequences for grasping objects based on visual input and text instructions.
Performs well in Google Robot tasks.
Object Placement
Places specified objects at target locations.
Demonstrates high success rates in WidowX Robot tasks.
Spatial Understanding
Spatial Relationship Reasoning
Understands relative positional relationships between objects.
Excels in spatial understanding evaluations.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase