Model Selection

Multimodal Learning

# Multimodal Learning

Openvision Vit So400m Patch14 384

OpenVision is a fully open, cost-effective family of advanced vision encoders for multimodal learning.

Multimodal Fusion

Openvision Vit Base Patch16 160

OpenVision is a fully open-source, cost-effective family of advanced vision encoders for multimodal learning.

Multimodal Fusion

Openvision Vit Small Patch8 384

OpenVision is a fully open, cost-effective family of advanced vision encoders focused on multimodal learning.

Multimodal Fusion

Openvision Vit Small Patch16 224

OpenVision is a fully open, cost-effective family of advanced vision encoders focused on multimodal learning.

Image Enhancement

A PyTorch-based action recognition model for robotics applications

Video Processing

Wedgit Stack Single Fixed

A robot control model based on diffusion policy, released via PyTorchModelHubMixin integration

Multimodal Fusion

Genmedclip B 16 PMB

A zero-shot image classification model based on the open_clip library, specializing in medical field image analysis

Image Classification

GenMedClip is a zero-shot image classification model based on the open_clip library, specializing in medical image analysis.

Image Classification

Moe LLaVA Qwen 1.8B 4e

MoE-LLaVA is a large vision-language model based on the Mixture of Experts architecture, achieving efficient multimodal learning through sparse activation parameters

A zero-shot image classification model based on the Open CLIP library, supporting various vision tasks

Image Classification

mkaichristensen

A multimodal model based on Microsoft's GIT framework, focused on extracting text from student homework images and generating teacher feedback

Transformers Supports Multiple Languages

Git Base Textvqa

A visual question answering model fine-tuned on the textvqa dataset based on microsoft/git-base-textvqa, excelling at handling image-based question answering tasks involving text

Large Language Model

Transformers Other

A model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase