J

Japanese Clip Vit B 16

Developed by rinna
A Japanese CLIP model trained by rinna Co., Ltd., supporting contrastive learning between Japanese text and images
Downloads 26.12k
Release Time : 4/27/2022

Model Overview

This model is a multimodal model based on the CLIP architecture, capable of mapping Japanese text and images into the same feature space for cross-modal retrieval and classification tasks.

Model Features

Japanese-Specific
A CLIP model optimized specifically for Japanese, supporting associative learning between Japanese text and images
Multimodal Capability
Capable of processing both image and text inputs for cross-modal feature extraction and matching
Pretrained Model
Pretrained on a large-scale dataset (CC12M) and ready for direct use in downstream tasks

Model Capabilities

Image Feature Extraction
Japanese Text Feature Extraction
Image-Text Similarity Calculation
Cross-modal Retrieval

Use Cases

Image Classification
Multi-label Image Classification
Classify images using Japanese labels
Can output probability distributions for each label
Cross-modal Search
Text-to-Image Search
Search for relevant images using Japanese text descriptions
Image-to-Text Search
Search for matching Japanese text descriptions using images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase