# Vision-Language-Action
Hume System2
MIT
Hume-System2 is the pre-trained weights of System 2 for a dual-system Vision-Language-Action (VLA) model, used to accelerate the training of System 2 and provide support for relevant research and applications in the field of robotics.
Multimodal Fusion
Transformers English

H
Hume-vla
3,225
1
Minivla History2 Vq Libero90 Prismatic
MIT
MiniVLA is a compact yet high-performance vision-language-action model, compatible with Prismatic VLMs training scripts, suitable for robotics and multimodal tasks.
Image-to-Text
Transformers English

M
Stanford-ILIAD
22
1
Rdt 170m
MIT
RDT-170M is a 170-million-parameter imitation learning diffusion Transformer model designed for robot vision-language-action tasks.
Multimodal Fusion
Transformers English

R
robotics-diffusion-transformer
278
7
Featured Recommended AI Models