J

Japanese Clip Vit B 32 Roberta Base

Developed by recruit-jp
A Japanese version of the CLIP model that maps Japanese text and images into the same embedding space, suitable for zero-shot image classification, text-image retrieval, and other tasks.
Downloads 384
Release Time : 12/20/2023

Model Overview

This model is a Japanese version of CLIP (Contrastive Language-Image Pretraining), based on a ViT-B/32 image encoder and Roberta Base text encoder, specifically optimized for Japanese.

Model Features

Japanese Optimization
Specifically optimized for Japanese text and images, outperforming general multilingual CLIP models in Japanese tasks.
Dual-Modal Embedding
Capable of mapping images and text into the same embedding space, enabling cross-modal retrieval and comparison.
Zero-shot Learning
Performs image classification and retrieval tasks without task-specific training.

Model Capabilities

Zero-shot image classification
Text-image retrieval
Image feature extraction
Text feature extraction
Cross-modal similarity calculation

Use Cases

E-commerce
Product Image Search
Search for relevant product images using Japanese text descriptions
Improves search accuracy and user experience
Content Management
Automatic Image Tagging
Automatically generate Japanese tags for images
Reduces manual labeling costs
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase