S

Spatialvla 4b 224 Sft Fractal

Developed by IPEC-COMMUNITY
SpatialVLA is a vision-language-action model fine-tuned on the fractal dataset, primarily used for robot control tasks.
Downloads 375
Release Time : 3/16/2025

Model Overview

This model combines visual and language inputs to output robot action commands, suitable for general robot strategy development.

Model Features

Multimodal Understanding
Capable of processing both visual and language inputs to comprehend complex scenes
Robot Action Generation
Generates precise robot action commands based on visual and language inputs
Large-Scale Pretraining
Pretrained on 1.1 million real-world robot demonstration data points, enabling broad task adaptability

Model Capabilities

Visual Scene Understanding
Natural Language Instruction Parsing
Robot Action Planning
Multimodal Feature Fusion

Use Cases

Robot Control
Object Grasping
Plans grasping actions based on visual input and language instructions
Excellent performance on the SimplerEnv benchmark
Spatial Navigation
Understands spatial relationships and generates navigation paths
Achieved high scores in spatial understanding evaluations
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase