G

GME VARCO VISION Embedding

Developed by NCSOFT
GME-VARCO-VISION-Embedding is a multimodal embedding model that focuses on calculating the semantic similarity between text, images, and videos in a high-dimensional embedding space, and is particularly good at video retrieval tasks.
Downloads 789
Release Time : 6/10/2025

Model Overview

This model can calculate the semantic similarity between text, images, and videos in a high-dimensional embedding space, focuses on video retrieval tasks, and has high retrieval accuracy and strong generalization performance.

Model Features

Multimodal embedding
It can process data in three modalities: text, images, and videos, and calculate the semantic similarity between them in a high-dimensional embedding space.
Video retrieval focus
The video retrieval ability is specially optimized, which requires higher complexity and context understanding ability compared to image retrieval.
Contrastive learning fine-tuning
Fine-tuning is performed using the 17k video preference dataset of ShareGPTVideo through contrastive learning, which improves the retrieval performance of the model.
Retrieval vector enhancement
The generalization ability of the model is enhanced by adding the retrieval vector obtained from the weight difference between the base model and its retrieval-optimized version.

Model Capabilities

Text-image retrieval
Text-video retrieval
Multimodal feature extraction
Semantic similarity calculation

Use Cases

Video retrieval
Scene-based video search
Retrieve relevant video clips based on scene descriptions
High retrieval accuracy
Description-based video search
Retrieve relevant video content based on text descriptions
Strong generalization performance
Question-answer-based video search
Retrieve relevant video answers based on questions
Accurate context understanding
Image retrieval
Description-based image search
Retrieve relevant images based on text descriptions
Efficient semantic matching
Featured Recommended AI Models
ยฉ 2025AIbase