M

Mmlw Retrieval Roberta Large V2

Developed by sdadas
MMLW is a neural text encoder for Polish, optimized for information retrieval tasks, capable of converting queries and paragraphs into 1024-dimensional vectors.
Downloads 2,091
Release Time : 3/23/2025

Model Overview

This model is based on polish-roberta-large-v2. Through multilingual knowledge distillation and contrastive loss fine-tuning, modern English retrievers and re-rankers based on large language models are integrated, improving the performance.

Model Features

Multilingual knowledge distillation
Knowledge distillation is performed using stella_en_1.5B_v5 as the teacher model, improving the model performance.
Contrastive loss fine-tuning
A dataset with over 4 million queries is used for fine-tuning through contrastive loss, optimizing the information retrieval effect.
High-dimensional vector representation
It can convert queries and paragraphs into 1024-dimensional vectors, suitable for information retrieval tasks.

Model Capabilities

Information retrieval
Semantic text similarity calculation

Use Cases

Information retrieval
Polish document retrieval
Match user queries with paragraphs in the document library and return the most relevant documents.
An NDCG@10 of 60.71 was achieved in the Polish information retrieval benchmark test.
Semantic similarity
Polish sentence similarity calculation
Calculate the semantic similarity between two Polish sentences.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase