Model Selection

Japanese Multimodal

# Japanese Multimodal

Llama 3 EvoVLM JP V2

Llama-3-EvoVLM-JP-v2 is an experimental general-purpose Japanese vision-language model that supports interleaved input of text and images. This model was created using an evolutionary model fusion approach.

Transformers Japanese

Clip Japanese Base

A Japanese CLIP model developed by LY Corporation, trained on approximately 1 billion web-collected image-text pairs, suitable for various vision tasks.

Transformers Japanese

line-corporation

Japanese Clip Vit B 32 Roberta Base

A Japanese version of the CLIP model that maps Japanese text and images into the same embedding space, suitable for zero-shot image classification, text-image retrieval, and other tasks.

Transformers Japanese

Japanese Cloob Vit B 16

Japanese CLOOB (Contrastive Leave-One-Out Boost) model trained by rinna Co., Ltd. for cross-modal understanding of images and text

Transformers Japanese

Japanese Clip Vit B 16

A Japanese CLIP model trained by rinna Co., Ltd., supporting contrastive learning between Japanese text and images

Transformers Japanese

Clip Vit B 32 Japanese V1

This is a Japanese CLIP text/image encoder model converted from the English CLIP model through distillation techniques.

Transformers Japanese

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase