Octo Small 1.5
Octo Small is a diffusion policy model for robot control, based on Transformer architecture, capable of predicting robot actions from visual inputs and language instructions.
Downloads 250
Release Time : 5/21/2024
Model Overview
This model is a 27-million-parameter Transformer architecture designed specifically for robot control tasks. It processes visual inputs (main camera and wrist camera images) and language instructions to predict 4-step sequences of 7-dimensional actions. The model is trained using diffusion policy with a window size of 2.
Model Features
Multimodal input processing
Capable of processing both visual inputs (camera images) and language instructions
Diffusion policy
Trained using diffusion policy to predict 4-step sequences of 7-dimensional actions
Lightweight architecture
27-million-parameter Transformer architecture suitable for real-time robot control
Extensive dataset training
Trained on the Open X-Embodiment mixed dataset containing 25 different robot datasets
Model Capabilities
Vision-language multimodal processing
Robot action prediction
Real-time control
Multi-task learning
Use Cases
Robot control
Vision-based object grasping
Controls robot to grasp specific objects based on camera input and language instructions
Tabletop manipulation tasks
Performs various manipulation tasks in tabletop environments, such as pushing, pulling, rotating, etc.
Industrial automation
Assembly line operations
Performs precise assembly tasks in industrial environments
Featured Recommended AI Models
Š 2025AIbase