Clip Japanese Base
A Japanese CLIP model developed by LY Corporation, trained on approximately 1 billion web-collected image-text pairs, suitable for various vision tasks.
Downloads 14.31k
Release Time : 4/24/2024
Model Overview
This model is a Japanese version of Contrastive Language-Image Pretraining (CLIP), suitable for tasks like zero-shot image classification, text-to-image or image-to-text retrieval.
Model Features
Powerful Japanese Vision-Language Understanding
A CLIP model specifically optimized for Japanese, capable of understanding relationships between Japanese text and images.
Efficient Architecture Design
Utilizes Eva02-B as image encoder, more efficient compared to traditional ViT architectures.
Large-scale Pretraining Data
Trained on approximately 1 billion web-collected image-text pairs, covering diverse scenarios.
Model Capabilities
Zero-shot Image Classification
Text-to-Image Retrieval
Image-to-Text Retrieval
Cross-modal Feature Extraction
Use Cases
Image Retrieval
Japanese Description-based Image Search
Retrieve relevant images using Japanese text queries
Achieves R@1 of 0.30 on STAIR Captions dataset
Image Classification
Zero-shot Japanese Image Classification
Classify images without fine-tuning
Achieves 89% accuracy on Recruit Datasets
Featured Recommended AI Models