M

Mmlw Retrieval E5 Large

Developed by sdadas
MMLW is a neural text encoder for Polish, optimized for information retrieval tasks, capable of converting queries and passages into 1024-dimensional vectors
Downloads 56
Release Time : 10/18/2023

Model Overview

This model is specifically designed for Polish information retrieval tasks through multilingual knowledge distillation and contrastive loss fine-tuning, encoding queries and passages into high-dimensional vectors for similarity calculation

Model Features

Multilingual knowledge distillation
Uses English FlagEmbeddings as the teacher model, trained on 60 million Polish-English text pairs through knowledge distillation
Contrastive loss fine-tuning
Fine-tuned on the Polish version of the MS MARCO dataset with large-batch contrastive learning to optimize retrieval performance
Prefix-aware encoding
Improves retrieval accuracy by adding 'query:' and 'passage:' prefixes to distinguish between query and passage encoding

Model Capabilities

Text vectorization
Semantic similarity calculation
Information retrieval
Cross-language retrieval

Use Cases

Search engines
Polish document retrieval
Retrieves the most relevant content from a Polish document library based on user queries
Achieved an NDCG@10 score of 58.30 on the PIRB benchmark
Q&A systems
Polish FAQ matching
Semantically matches user questions with a FAQ database
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase