AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Vision-Language-Action

# Vision-Language-Action

Hume System2
MIT
Hume-System2 is the pre-trained weights of System 2 for a dual-system Vision-Language-Action (VLA) model, used to accelerate the training of System 2 and provide support for relevant research and applications in the field of robotics.
Multimodal Fusion Transformers English
H
Hume-vla
3,225
1
Minivla History2 Vq Libero90 Prismatic
MIT
MiniVLA is a compact yet high-performance vision-language-action model, compatible with Prismatic VLMs training scripts, suitable for robotics and multimodal tasks.
Image-to-Text Transformers English
M
Stanford-ILIAD
22
1
Rdt 170m
MIT
RDT-170M is a 170-million-parameter imitation learning diffusion Transformer model designed for robot vision-language-action tasks.
Multimodal Fusion Transformers English
R
robotics-diffusion-transformer
278
7
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase