M

Mmlw Retrieval Roberta Large

Developed by sdadas
MMLW (I Must Get Better Messages) is a neural text encoder for Polish, optimized for information retrieval tasks.
Downloads 237.90k
Release Time : 10/18/2023

Model Overview

This model converts queries and passages into 1024-dimensional vectors, primarily for Polish information retrieval tasks. It employs a two-step training process: first trained via multilingual knowledge distillation, then fine-tuned on the Polish version of the MS MARCO dataset.

Model Features

Multilingual knowledge distillation
Trained using 60 million Polish-English text pairs with English FlagEmbeddings as the teacher model.
Contrastive loss fine-tuning
Fine-tuned on the Polish MS MARCO dataset with contrastive loss, employing a large-batch training strategy.
Specific prefix handling
Requires adding specific prefixes/suffixes when encoding text; queries must be prefixed with 'zapytanie:'.

Model Capabilities

Text encoding
Sentence similarity calculation
Information retrieval

Use Cases

Information retrieval
Q&A system
Used to build Polish Q&A systems, matching questions with the most relevant answers.
Can accurately identify the most relevant answers to queries.
Document retrieval
Retrieves the most relevant documents from a large collection of Polish texts based on queries.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase