🚀 MiniLM-L6-danish-reranker
This is a lightweight (~22 M parameters) sentence-transformers model for Danish NLP. It takes two sentences as input and outputs a relevance score, which can be used for information retrieval, such as ranking candidates by their relevance given a query and candidate matches.
New version available, trained on more data and otherwise identical KennethTM/MiniLM-L6-danish-reranker-v2
🚀 Quick Start
This is a lightweight (~22 M parameters) sentence-transformers model for Danish NLP: It takes two sentences as input and outputs a relevance score. Therefore, the model can be used for information retrieval, e.g. given a query and candidate matches, rank the candidates by their relevance.
The maximum sequence length is 512 tokens (for both passages).
The model was not pre-trained from scratch but adapted from the English version of cross-encoder/ms-marco-MiniLM-L-6-v2 with a Danish tokenizer.
Trained on ELI5 and SQUAD data machine translated from English to Danish.
✨ Features
- Lightweight with approximately 22 million parameters.
- Takes two sentences as input and outputs a relevance score.
- Can be used for information retrieval tasks.
- Adapted from an English model with a Danish tokenizer.
- Trained on machine - translated Danish data from ELI5 and SQUAD.
💻 Usage Examples
Basic Usage
Usage with Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model = AutoModelForSequenceClassification.from_pretrained('KennethTM/MiniLM-L6-danish-reranker')
tokenizer = AutoTokenizer.from_pretrained('KennethTM/MiniLM-L6-danish-reranker')
features = tokenizer(['Kører der cykler på vejen?', 'Kører der cykler på vejen?'], ['En panda løber på vejen.', 'En mand kører hurtigt forbi på cykel.'], padding=True, truncation=True, return_tensors="pt")
model.eval()
with torch.no_grad():
scores = model(**features).logits
print(scores)
Usage with SentenceTransformers
The usage becomes easier when you have SentenceTransformers installed. Then, you can use the pre-trained models like this:
from sentence_transformers import CrossEncoder
model = CrossEncoder('KennethTM/MiniLM-L6-danish-reranker', max_length=512)
scores = model.predict([('Kører der cykler på vejen?', 'En panda løber på vejen.'), ('Kører der cykler på vejen?', 'En mand kører hurtigt forbi på cykel.')])
📄 License
This project is licensed under the MIT license.
Property |
Details |
Model Type |
A lightweight sentence-transformers model for Danish NLP |
Training Data |
ELI5 and SQUAD data machine translated from English to Danish, including datasets like squad, eli5, sentence-transformers/embedding-training-data, KennethTM/squad_pairs_danish, KennethTM/eli5_question_answer_danish |
Library Name |
sentence-transformers |
Pipeline Tag |
text-ranking |