M

Moco Sentencedistilbertv2.0

Developed by bongsoo
This is a Korean-English bilingual sentence embedding model based on sentence-transformers, which maps sentences to a 768-dimensional vector space, suitable for semantic search and clustering tasks.
Downloads 39
Release Time : 9/5/2022

Model Overview

This model is improved upon mdistilbertV1.1, trained on a 3.2M-sentence moco-corpus through STS teacher-student distillation, supporting sentence similarity calculation in Korean and English.

Model Features

Bilingual Support
Supports sentence embeddings for both Korean and English
Efficient Distillation
Improves model performance through teacher-student distillation training
Large-scale Training
Trained on a 3.2M-sentence moco-corpus
Optimized Vocabulary
Vocabulary expanded to 164,314 words, adding 17,870 new words compared to the original model

Model Capabilities

Sentence Embedding
Semantic Similarity Calculation
Text Clustering
Cross-language Retrieval

Use Cases

Information Retrieval
Cross-language Document Retrieval
Finding semantically similar documents in mixed Korean and English document libraries
Effectively identifies semantically similar documents across different languages
Q&A Systems
Question Matching
Matching user questions with similar questions in the knowledge base
As shown in the example, accurately identifies the semantic similarity between 'What is the capital of Korea?' and 'Seoul is the capital of Korea'
Content Recommendation
Similar Content Recommendation
Recommending related articles or products based on semantic similarity
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase