O

Octo Small 1.5

Developed by rail-berkeley
Octo Small is a diffusion policy model for robot control, based on Transformer architecture, capable of predicting robot actions from visual inputs and language instructions.
Downloads 250
Release Time : 5/21/2024

Model Overview

This model is a 27-million-parameter Transformer architecture designed specifically for robot control tasks. It processes visual inputs (main camera and wrist camera images) and language instructions to predict 4-step sequences of 7-dimensional actions. The model is trained using diffusion policy with a window size of 2.

Model Features

Multimodal input processing
Capable of processing both visual inputs (camera images) and language instructions
Diffusion policy
Trained using diffusion policy to predict 4-step sequences of 7-dimensional actions
Lightweight architecture
27-million-parameter Transformer architecture suitable for real-time robot control
Extensive dataset training
Trained on the Open X-Embodiment mixed dataset containing 25 different robot datasets

Model Capabilities

Vision-language multimodal processing
Robot action prediction
Real-time control
Multi-task learning

Use Cases

Robot control
Vision-based object grasping
Controls robot to grasp specific objects based on camera input and language instructions
Tabletop manipulation tasks
Performs various manipulation tasks in tabletop environments, such as pushing, pulling, rotating, etc.
Industrial automation
Assembly line operations
Performs precise assembly tasks in industrial environments
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase