M

Moco Sentencebertv2.0

Developed by bongsoo
A sentence embedding model optimized for Korean and English, supporting semantic similarity calculation and text feature extraction
Downloads 17
Release Time : 9/19/2022

Model Overview

This model is an improved sentence embedding model based on multilingual BERT, optimized through teacher-student distillation training, suitable for Korean and English sentence similarity calculation, semantic search, and text clustering tasks.

Model Features

Bilingual Optimization
Specially optimized for Korean and English, excelling in semantic understanding tasks for both languages
Knowledge Distillation
Uses paraphrase-multilingual-mpnet-base-v2 as the teacher model for distillation training to enhance model performance
Extended Vocabulary
Added 32,989 new vocabulary items to the original multilingual BERT, totaling 152,537 vocabulary items
Efficient Inference
Supports input lengths of up to 128 tokens, with GPU memory usage of approximately 9GB during inference

Model Capabilities

Sentence embedding generation
Semantic similarity calculation
Text feature extraction
Cross-language semantic matching

Use Cases

Information Retrieval
Similar Question Matching
Finding semantically similar questions to user queries in Q&A systems
Achieved a cosine similarity score of 0.824 on the korsts test set
Content Recommendation
Related Article Recommendation
Recommending related articles or news based on content semantic similarity
Multilingual Applications
Korean-English Cross-language Search
Supports cross-language semantic matching between Korean and English
Achieved a similarity score of 0.843 on the stsb_multi_mt dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase