J

Japanese Cloob Vit B 16

Developed by rinna
Japanese CLOOB (Contrastive Leave-One-Out Boost) model trained by rinna Co., Ltd. for cross-modal understanding of images and text
Downloads 229.51k
Release Time : 4/27/2022

Model Overview

This model is based on the CLOOB architecture and can understand the relationship between Japanese text and images, supporting tasks such as image classification and text-image matching

Model Features

Japanese Cross-Modal Understanding
A vision-language model specifically designed for Japanese, effectively understanding the relationship between Japanese text and images
CLOOB Architecture
Utilizes Contrastive Leave-One-Out Boost (CLOOB) method to enhance cross-modal representation learning
Pre-trained ViT Model
Image encoder initialized based on the AugReg vit-base-patch16-224 model

Model Capabilities

Image Feature Extraction
Text Feature Extraction
Image-Text Matching
Cross-Modal Retrieval

Use Cases

Image Classification
Animal Image Classification
Identify animal categories in images (e.g., dogs, cats, elephants)
Example shows 100% accuracy in classifying dog images
Cross-Modal Retrieval
Text-to-Image Retrieval
Retrieve relevant images based on Japanese text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase