U

USER2 Base

Developed by deepvk
USER2 is a next-generation Russian universal sentence encoder, supporting context sentence representations up to 8,192 tokens, built on RuModernBERT-base and optimized for retrieval and semantic tasks
Downloads 1,101
Release Time : 2/25/2025

Model Overview

A universal sentence encoder specifically designed for Russian, supporting long-context representations and Matryoshka Representation Learning (MRL) technology, suitable for retrieval and various semantic tasks

Model Features

Long Context Support
Supports processing texts up to 8,192 tokens, ideal for long document retrieval and analysis
Matryoshka Representation Learning (MRL)
Supports dimension pruning technology to reduce embedding dimensions with minimal quality loss
Multi-Task Prefix Optimization
Employs task-specific prefix design to optimize representations for different scenarios (classification/clustering/retrieval)
Efficient Parameter Design
The base version with 149 million parameters achieves a good balance between performance and efficiency

Model Capabilities

Text Embedding Generation
Semantic Similarity Calculation
Document Retrieval
Text Clustering
Multi-Label Classification
Re-Ranking Tasks

Use Cases

Information Retrieval
Long Document Retrieval
Finding relevant information in long document collections
Achieves nDCG@10 of 54.17 on MLDR-rus test
Question Answering System
Matching questions with candidate answers
Text Analysis
Text Clustering
Grouping similar documents together
Scores 59.22 on MTEB-rus clustering task
Semantic Similarity Calculation
Measuring semantic relationships between texts
Scores 74.28 on MTEB-rus similarity task
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase