M

Multilingual SimCSE

Developed by WENGSYX
A contrastive learning model trained using parallel language pairs, mapping texts to the same vector space across different languages
Downloads 84
Release Time : 3/2/2022

Model Overview

Multilingual sentence embedding model based on mDeBERTa architecture, trained with contrastive learning on parallel corpora, supporting cross-lingual semantic similarity computation

Model Features

Cross-lingual Alignment
Training with parallel corpora maps sentences in different languages to a unified semantic space
Contrastive Learning Optimization
Uses SimCSE-style contrastive loss function to enhance semantic representation
Large-scale Training
Pre-trained using 100 million parallel sentence pairs

Model Capabilities

Cross-lingual sentence embedding
Semantic similarity computation
Multilingual text alignment

Use Cases

Cross-lingual Retrieval
Multilingual Document Matching
Finding semantically similar documents in document libraries of different languages
Cosine similarity 0.87 (example similarity between 'Hello,world' and '你好,世界')
Machine Translation Assistance
Translation Quality Evaluation
Assessing translation quality through embedding similarity
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase