MoVA-8B is an open-source multimodal large language model that uses a coarse-to-fine mechanism to adaptively route and fuse visual expert modules for specific tasks. It can be used for research on multimodal models and chatbots.
Multimodal Fusion
Transformers