C

Clip ViT B 32 Multilingual V1 ONNX

Developed by canavar
Multilingual ONNX version of OpenAI CLIP-ViT-B32 model, supporting text and image embeddings in the same vector space for 50+ languages
Downloads 86
Release Time : 11/22/2023

Model Overview

This model maps text (supporting 50+ languages) and images into a shared dense vector space, enabling cross-modal similarity computation, suitable for image search and multilingual zero-shot image classification tasks.

Model Features

Multilingual Alignment
Aligns text embeddings of 50+ languages with CLIP image embeddings into the same vector space through knowledge distillation
Cross-modal Retrieval
Supports bidirectional similarity computation between text-to-image and image-to-text
Zero-shot Classification
Allows defining classification categories directly with text labels without fine-tuning
ONNX Format
Model converted to ONNX format for cross-platform deployment

Model Capabilities

Multilingual text embedding
Image feature extraction
Cross-modal similarity computation
Zero-shot image classification
Multilingual image search

Use Cases

Image Retrieval
Multilingual Image Search
Search relevant images using text queries in any supported language
Examples show correct matching of images described in different languages
Intelligent Classification
Zero-shot Image Classification
Define classification categories directly with text labels without training data
Supports dynamic addition/modification of classification categories
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase