e5-large-en-ru Open-source Model - Only Contains Russian and English Tokens, Retains Original Performance

E5 Large En Ru

Developed by d0rj

This is a vocabulary-pruned version of the intfloat/multilingual-e5-large model, retaining only Russian and English tokens while maintaining the original model's performance.

Text Embedding

Transformers

Supports Multiple LanguagesOpen Source License:MIT #English-Russian Bilingual Embedding #Retrieval Optimization #Vocabulary Pruning

Downloads 712

Release Time : 9/18/2023

Model Overview

E5-large-en-ru is a multilingual text embedding model specifically optimized for Russian and English, suitable for tasks such as information retrieval and semantic similarity calculation.

Model Features

Vocabulary Optimization

Pruning retains only Russian and English tokens, significantly reducing model size while maintaining performance.

High-Performance Retrieval

Excellent performance on the SberQuAD benchmark, with metrics comparable to the original model.

Multi-Task Adaptation

Supports distinguishing different task types (query/passage/symmetric tasks) via prefixes.

Model Capabilities

Text vectorization

Semantic similarity calculation

Information retrieval

Cross-language text matching

Use Cases

Information Retrieval

Open-Domain Question Answering

Used to retrieve the most relevant document passages for questions.

Achieved recall@5 of 82.8% on SberQuAD test.

Semantic Analysis

Document Similarity Calculation

Compare semantic similarity between different documents.

Property	intfloat/multilingual-e5-large	d0rj/e5-large-en-ru
Model size (MB)	2135.82	1394.8
Params (count)	559,890,946	365,638,14
Word embeddings dim	256,002,048	61,749,248

Metric on SberQuAD (4122 questions)	intfloat/multilingual-e5-large	d0rj/e5-large-en-ru
recall@3	0.787239204269772	0.7882096069868996
map@3	0.7230713245997101	0.723192624939351
mrr@3	0.7241630276564784	0.7243651948892132
recall@5	0.8277535177098496	0.8284813197476953
map@5	0.7301603186155587	0.7302573588872716
mrr@5	0.7334667637069385	0.7335718906679607
recall@10	0.8716642406598738	0.871421639980592
map@10	0.7314774917730316	0.7313000338687417
mrr@10	0.7392223685527911	0.7391814537556898

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

E5 Large En Ru

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 E5-large-en-ru

📚 Documentation

Model info

Size

Performance

💻 Usage Examples

Basic Usage

Usage Tips

📄 License