🚀 ruRoPEBert Classic Model for Russian language
This is an encoder model from Tochka AI based on the RoPEBert architecture. It uses the cloning method described in our article on Habr. The model is trained on the CulturaX dataset. Using the ai-forever/ruBert-base model as the original, our model surpasses it in quality according to the encodechka benchmark.
The model source code is available in the file modeling_rope_bert.py. The model is trained on contexts up to 512 tokens in length, but can be used on larger contexts. For better quality, consider using the version of this model with extended context - Tochka-AI/ruRoPEBert-classic-base-2k.
🚀 Quick Start
Prerequisites
⚠️ Important Note
The recommended version of transformers
is 4.37.2 and higher. To load the model correctly, you must enable downloading code from the model's repository: trust_remote_code=True
. This will download the modeling_rope_bert.py script and load the weights into the correct architecture. Otherwise, you can download this script manually and use classes from it directly to load the model.
💻 Usage Examples
🔍 Basic Usage (no efficient attention)
model_name = 'Tochka-AI/ruRoPEBert-classic-base-512'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_implementation='eager')
⚡ With SDPA (efficient attention)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa')
📈 Getting embeddings
The correct pooler (mean
) is already built into the model architecture, which averages embeddings based on the attention mask. You can also select the pooler type (first_token_transform
), which performs a learnable linear transformation on the first token.
To change the built - in pooler implementation, use the pooler_type
parameter in the AutoModel.from_pretrained
function.
test_batch = tokenizer.batch_encode_plus(["Привет, чем занят?", "Здравствуйте, чем вы занимаетесь?"], return_tensors='pt', padding=True)
with torch.inference_mode():
pooled_output = model(**test_batch).pooler_output
In addition, you can calculate cosine similarities between texts in batch using normalization and matrix multiplication:
import torch.nn.functional as F
F.normalize(pooled_output, dim=1) @ F.normalize(pooled_output, dim=1).T
📊 Using as classifier
To load the model with a trainable classification head on top (change the num_labels
parameter):
model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation='sdpa', num_labels=4)
📏 With RoPE scaling
Allowed types for RoPE scaling are: linear
and dynamic
. To extend the model's context window, you need to change the tokenizer's max length and add the rope_scaling
parameter.
If you want to scale your model context by 2x:
tokenizer.model_max_length = 1024
model = AutoModel.from_pretrained(model_name,
trust_remote_code=True,
attn_implementation='sdpa',
rope_scaling={'type': 'dynamic','factor': 2.0}
)
💡 Usage Tip
Don't forget to specify the dtype and device you need to use resources efficiently.
📊 Metrics
Evaluation of this model on the encodechka benchmark:
Model name |
STS |
PI |
NLI |
SA |
TI |
IA |
IC |
ICX |
NE1 |
NE2 |
Avg S (no NE) |
Avg S+W (with NE) |
ruRoPEBert-classic-base-512 |
0.695 |
0.605 |
0.396 |
0.794 |
0.975 |
0.797 |
0.769 |
0.386 |
0.410 |
0.609 |
0.677 |
0.630 |
ai-forever/ruBert-base |
0.670 |
0.533 |
0.391 |
0.773 |
0.975 |
0.783 |
0.765 |
0.384 |
- |
- |
0.659 |
- |
👨💻 Authors
- Sergei Bratchikov (Tochka AI Team, HF, GitHub)
- Maxim Afanasiev (Tochka AI Team, HF, GitHub)