J

Jina Clip V2

Developed by jinaai
Jina CLIP v2 is a versatile multilingual multimodal embedding model suitable for text and images, supporting 89 languages, with higher image resolution and nested representation capabilities.
Downloads 47.56k
Release Time : 10/8/2024

Model Overview

Jina CLIP v2 is a multilingual multimodal embedding model that combines the text encoder Jina-XLM-RoBERTa and the vision encoder EVA02-L14, supporting aligned representations for text and images, suitable for cross-modal search and understanding.

Model Features

Multilingual Support
Supports multilingual-image retrieval for 89 languages, with performance improvements up to 4% compared to similar models.
High Image Resolution
Supports input image resolution of 512x512, a significant improvement over the previous model's 224x224, enabling better handling of detailed images.
Nested Representation
Allows truncating output dimensions from 1024 to 64, reducing storage and processing overhead while maintaining strong performance.
Performance Improvement
Achieves a 3% performance improvement over the previous model in text-image and text-text retrieval tasks.

Model Capabilities

Text Embedding
Image Embedding
Cross-Modal Retrieval
Multilingual Understanding
Feature Extraction
Sentence Similarity Calculation

Use Cases

Information Retrieval
Cross-Modal Search
Use text queries to search for related images, or use images to search for related text descriptions.
Multilingual Image Retrieval
Retrieve related images using text queries in different languages.
Content Understanding
Multimodal Content Analysis
Analyze the relationship between image and text content for content recommendation or classification.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase