D

Deberta V1 Distill

Developed by deepvk
A bidirectional encoder model pre-trained for Russian language, trained on large-scale text corpora using standard masked language modeling objectives
Downloads 166
Release Time : 3/17/2023

Model Overview

This is a Russian pre-trained model based on the DeBERTa architecture, primarily used for feature extraction tasks. The model is compressed using distillation techniques while retaining the core capabilities of the teacher model.

Model Features

Efficient Distillation
Adopts the distillation method by SANH et al., using interleaved layer extraction from the teacher model for initialization, reducing model size while maintaining performance
Large-scale Training Data
Uses 400GB of rigorously deduplicated mixed text data, including sources such as Wikipedia, social media, and literary websites
Optimized Deduplication Process
Employs 5-character shingle fingerprints and MinHash technology for efficient deduplication, ensuring high-quality training data

Model Capabilities

Russian text feature extraction
Multilingual understanding
Contextual encoding

Use Cases

Natural Language Processing
Russian Text Classification
Can be used for tasks such as sentiment analysis and topic classification of Russian texts
Information Retrieval
Generates high-quality embeddings for Russian documents to improve retrieval effectiveness
Featured Recommended AI Models
ยฉ 2025AIbase