C

Clip Japanese Base

Developed by line-corporation
A Japanese CLIP model developed by LY Corporation, trained on approximately 1 billion web-collected image-text pairs, suitable for various vision tasks.
Downloads 14.31k
Release Time : 4/24/2024

Model Overview

This model is a Japanese version of Contrastive Language-Image Pretraining (CLIP), suitable for tasks like zero-shot image classification, text-to-image or image-to-text retrieval.

Model Features

Powerful Japanese Vision-Language Understanding
A CLIP model specifically optimized for Japanese, capable of understanding relationships between Japanese text and images.
Efficient Architecture Design
Utilizes Eva02-B as image encoder, more efficient compared to traditional ViT architectures.
Large-scale Pretraining Data
Trained on approximately 1 billion web-collected image-text pairs, covering diverse scenarios.

Model Capabilities

Zero-shot Image Classification
Text-to-Image Retrieval
Image-to-Text Retrieval
Cross-modal Feature Extraction

Use Cases

Image Retrieval
Japanese Description-based Image Search
Retrieve relevant images using Japanese text queries
Achieves R@1 of 0.30 on STAIR Captions dataset
Image Classification
Zero-shot Japanese Image Classification
Classify images without fine-tuning
Achieves 89% accuracy on Recruit Datasets
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase