🚀 USER2-small
USER2 is a new-generation Universal Sentence Encoder for Russian. It's designed for sentence representation and supports long contexts of up to 8,192 tokens.
The models are built on top of the RuModernBERT
encoders and fine-tuned for retrieval and semantic tasks. They also support Matryoshka Representation Learning (MRL), a technique that can reduce embedding size with minimal loss in representation quality.
This is a small model with 34 million parameters.
🚀 Quick Start
Model Information
Property |
Details |
Model Type |
Sentence Transformer |
Base Model |
deepvk/RuModernBERT-small |
Training Datasets |
nomic-en, nomic-ru, in-house En - Ru parallel, cultura-sampled, etc. |
License |
apache-2.0 |
Model Comparison
Model |
Size |
Context Length |
Hidden Dim |
MRL Dims |
deepvk/USER2-small |
34M |
8192 |
384 |
[32, 64, 128, 256, 384] |
deepvk/USER2-base |
149M |
8192 |
768 |
[32, 64, 128, 256, 384, 512, 768] |
✨ Features
- Long-context Support: Capable of handling contexts up to 8,192 tokens.
- Matryoshka Representation Learning (MRL): Allows for dimensionality reduction of embeddings with minimal quality loss.
- Task-specific Prefixes: Supports task-specific prefixes for better performance in different tasks.
💻 Usage Examples
Prefixes
This model is trained similarly to Nomic Embed and requires task-specific prefixes to be added to the input. The choice of prefix depends on the specific task. Here are some general guidelines:
- "classification: " is the default and most universal prefix, often performing well across various tasks.
- "clustering: " is recommended for clustering applications, such as grouping texts into clusters, discovering shared topics, or removing semantic duplicates.
- "search_query: " and "search_document: " are intended for retrieval and reranking tasks. In some classification tasks, especially with shorter texts, "search_query" shows better performance than other prefixes. On the other hand, "search_document" can be beneficial for long-context sentence similarity tasks.
Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("deepvk/USER2-small")
query_embeddings = model.encode(["Когда был спущен на воду первый миноносец «Спокойный»?"], prompt_name="search_query")
document_embeddings = model.encode(["Спокойный (эсминец)\nЗачислен в списки ВМФ СССР 19 августа 1952 года."], prompt_name="search_document")
similarities = model.similarity(query_embeddings, document_embeddings)
To truncate the embedding dimension, simply pass the new value to the model initialization:
model = SentenceTransformer("deepvk/USER2-small", truncate_dim=128)
This model was trained with dimensions [32, 64, 128, 256, 384]
, so it’s recommended to use one of these for best performance.
Transformers
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = (
attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
)
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
input_mask_expanded.sum(1), min=1e-9
)
queries = ["search_query: Когда был спущен на воду первый миноносец «Спокойный»?"]
documents = ["search_document: Спокойный (эсминец)\nЗачислен в списки ВМФ СССР 19 августа 1952 года."]
tokenizer = AutoTokenizer.from_pretrained("deepvk/USER2-small")
model = AutoModel.from_pretrained("deepvk/USER2-small")
encoded_queries = tokenizer(queries, padding=True, truncation=True, return_tensors="pt")
encoded_documents = tokenizer(documents, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
queries_outputs = model(**encoded_queries)
documents_outputs = model(**encoded_documents)
query_embeddings = mean_pooling(queries_outputs, encoded_queries["attention_mask"])
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
doc_embeddings = mean_pooling(documents_outputs, encoded_documents["attention_mask"])
doc_embeddings = F.normalize(doc_embeddings, p=2, dim=1)
similarities = query_embeddings @ doc_embeddings.T
To truncate the embedding dimension, select the first values:
query_embeddings = mean_pooling(queries_outputs, encoded_queries["attention_mask"])
query_embeddings = query_embeddings[:, :truncate_dim]
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
📚 Documentation
Performance
To evaluate the model, we measure quality on the MTEB-rus
benchmark. Additionally, to measure long-context retrieval, we run the Russian subset of the MultiLongDocRetrieval (MLDR) task.
MTEB-rus
Model |
Size |
Hidden Dim |
Context Length |
MRL support |
Mean(task) |
Mean(taskType) |
Classification |
Clustering |
MultiLabelClassification |
PairClassification |
Reranking |
Retrieval |
STS |
USER-base |
124M |
768 |
512 |
❌ |
58.11 |
56.67 |
59.89 |
53.26 |
37.72 |
59.76 |
55.58 |
56.14 |
74.35 |
USER-bge-m3 |
359M |
1024 |
8192 |
❌ |
62.80 |
62.28 |
61.92 |
53.66 |
36.18 |
65.07 |
68.72 |
73.63 |
76.76 |
multilingual-e5-base |
278M |
768 |
512 |
❌ |
58.34 |
57.24 |
58.25 |
50.27 |
33.65 |
54.98 |
66.24 |
67.14 |
70.16 |
multilingual-e5-large-instruct |
560M |
1024 |
512 |
❌ |
65.00 |
63.36 |
66.28 |
63.13 |
41.15 |
63.89 |
64.35 |
68.23 |
76.48 |
jina-embeddings-v3 |
572M |
1024 |
8192 |
✅ |
63.45 |
60.93 |
65.24 |
60.90 |
39.24 |
59.22 |
53.86 |
71.99 |
76.04 |
ru-en-RoSBERTa |
404M |
1024 |
512 |
❌ |
61.71 |
60.40 |
62.56 |
56.06 |
38.88 |
60.79 |
63.89 |
66.52 |
74.13 |
USER2-small |
34M |
384 |
8192 |
✅ |
58.32 |
56.68 |
59.76 |
57.06 |
33.56 |
54.02 |
58.26 |
61.87 |
72.25 |
USER2-base |
149M |
768 |
8192 |
✅ |
61.12 |
59.59 |
61.67 |
59.22 |
36.61 |
56.39 |
62.06 |
66.90 |
74.28 |
MLDR-rus
Model |
Size |
nDCG@10 ↑ |
USER-bge-m3 |
359M |
58.53 |
KaLM-v1.5 |
494M |
53.75 |
jina-embeddings-v3 |
572M |
49.67 |
E5-mistral-7b |
7.11B |
52.40 |
USER2-small |
34M |
51.69 |
USER2-base |
149M |
54.17 |
We compare only models with a context length of 8192.
Matryoshka
To evaluate MRL capabilities, we also use MTEB-rus
, applying dimensionality cropping to the embeddings to match the selected size.

🔧 Technical Details
Training details
This is the small version with 34 million parameters, based on RuModernBERT-small
. It was fine-tuned in three stages: RetroMAE, Weakly Supervised Fine-Tuning, and Supervised Fine-Tuning.
Following the bge-m3 training strategy, we use RetroMAE as a retrieval-oriented continuous pretraining step. Leveraging data from the final stage of RuModernBERT training, RetroMAE enhances retrieval quality, especially for long-context inputs.
To follow best practices for building a state-of-the-art encoder, we rely on large-scale training with weakly related text pairs. However, such datasets are not publicly available for Russian, unlike for English or Chinese. To overcome this, we apply two complementary strategies:
- Cross-lingual transfer: We train on both English and Russian data, leveraging English resources (
nomic-unsupervised
) alongside our in-house English-Russian parallel corpora.
- Unsupervised pair mining: From the
deepvk/cultura_ru_edu
corpus, we extract 50M pairs using a simple heuristic—selecting non-overlapping text blocks that are not substrings of one another.
This approach has shown promising results, allowing us to train high-performing models with minimal target-language pairs, especially when compared to pipelines used for other languages.
The table below shows the datasets used and the number of times each was upsampled.
For the third stage, we switch to cleaner, task-specific datasets. In some cases, additional filtering was applied using a cross-encoder. For all retrieval datasets, we mine hard negatives.
Ablation
Alongside the final model, we also release all intermediate training steps. Both the retromae and weakly_sft models are available under the specified revisions in this repository. We hope these additional models prove useful for your experiments.
Below is a comparison of all training stages on a subset of MTEB-rus
.

📄 License
This project is licensed under the Apache-2.0 license.
📖 Citations
@misc{deepvk2025user,
title={USER2},
author={Malashenko, Boris and Spirin, Egor and Sokolov Andrey},
url={https://huggingface.co/deepvk/USER2-small},
publisher={Hugging Face},
year={2025},
}