# Japanese Multimodal
Llama 3 EvoVLM JP V2
Llama-3-EvoVLM-JP-v2 is an experimental general-purpose Japanese vision-language model that supports interleaved input of text and images. This model was created using an evolutionary model fusion approach.
Image-to-Text
Transformers Japanese

L
SakanaAI
475
20
Clip Japanese Base
Apache-2.0
A Japanese CLIP model developed by LY Corporation, trained on approximately 1 billion web-collected image-text pairs, suitable for various vision tasks.
Text-to-Image
Transformers Japanese

C
line-corporation
14.31k
22
Japanese Clip Vit B 32 Roberta Base
A Japanese version of the CLIP model that maps Japanese text and images into the same embedding space, suitable for zero-shot image classification, text-image retrieval, and other tasks.
Text-to-Image
Transformers Japanese

J
recruit-jp
384
9
Japanese Cloob Vit B 16
Apache-2.0
Japanese CLOOB (Contrastive Leave-One-Out Boost) model trained by rinna Co., Ltd. for cross-modal understanding of images and text
Text-to-Image
Transformers Japanese

J
rinna
229.51k
12
Japanese Clip Vit B 16
Apache-2.0
A Japanese CLIP model trained by rinna Co., Ltd., supporting contrastive learning between Japanese text and images
Text-to-Image
Transformers Japanese

J
rinna
26.12k
21
Clip Vit B 32 Japanese V1
This is a Japanese CLIP text/image encoder model converted from the English CLIP model through distillation techniques.
Text-to-Image
Transformers Japanese

C
sonoisa
690
21
Featured Recommended AI Models