MiniLM-L6-danish-reranker Open-Source Reranking Model - Free for Danish Information Retrieval

Minilm L6 Danish Reranker

Developed by KennethTM

This is a lightweight Danish text ranking model adapted from the English MiniLM-L6 model, specifically designed for Danish information retrieval tasks.

Text Embedding

Safetensors

OtherOpen Source License:MIT #Danish text ranking #Lightweight reranking model #Information retrieval optimization

Downloads 160

Release Time : 1/12/2024

Model Overview

The model takes two Danish sentences as input and outputs a relevance score, primarily used for ranking candidate results in information retrieval scenarios.

Model Features

Lightweight design

Only about 22M parameters, suitable for deployment in resource-limited environments

Danish optimization

Uses a Danish tokenizer and is trained on Danish data

Long text support

Supports input lengths of up to 512 tokens

Transfer learning

Adapted from the English MiniLM-L6 model rather than trained from scratch

Model Capabilities

Text relevance scoring

Information retrieval ranking

Question answering system support

Use Cases

Information retrieval

Search engine result ranking

Re-ranking Danish search engine results by relevance

Improves the relevance of search results

Question answering system

Scoring the relevance of candidate answers in a QA system

Helps the system select the most relevant answer

🚀 MiniLM-L6-danish-reranker

This is a lightweight (~22 M parameters) sentence-transformers model for Danish NLP. It takes two sentences as input and outputs a relevance score, which can be used for information retrieval, such as ranking candidates by their relevance given a query and candidate matches.

New version available, trained on more data and otherwise identical KennethTM/MiniLM-L6-danish-reranker-v2

🚀 Quick Start

This is a lightweight (~22 M parameters) sentence-transformers model for Danish NLP: It takes two sentences as input and outputs a relevance score. Therefore, the model can be used for information retrieval, e.g. given a query and candidate matches, rank the candidates by their relevance.

The maximum sequence length is 512 tokens (for both passages).

The model was not pre-trained from scratch but adapted from the English version of cross-encoder/ms-marco-MiniLM-L-6-v2 with a Danish tokenizer.

Trained on ELI5 and SQUAD data machine translated from English to Danish.

✨ Features

Lightweight with approximately 22 million parameters.
Takes two sentences as input and outputs a relevance score.
Can be used for information retrieval tasks.
Adapted from an English model with a Danish tokenizer.
Trained on machine - translated Danish data from ELI5 and SQUAD.

💻 Usage Examples

Basic Usage

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained('KennethTM/MiniLM-L6-danish-reranker')
tokenizer = AutoTokenizer.from_pretrained('KennethTM/MiniLM-L6-danish-reranker')
features = tokenizer(['Kører der cykler på vejen?', 'Kører der cykler på vejen?'], ['En panda løber på vejen.', 'En mand kører hurtigt forbi på cykel.'],  padding=True, truncation=True, return_tensors="pt")

model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)

Usage with SentenceTransformers

The usage becomes easier when you have SentenceTransformers installed. Then, you can use the pre-trained models like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('KennethTM/MiniLM-L6-danish-reranker', max_length=512)
scores = model.predict([('Kører der cykler på vejen?', 'En panda løber på vejen.'), ('Kører der cykler på vejen?', 'En mand kører hurtigt forbi på cykel.')])

📄 License

This project is licensed under the MIT license.

Property	Details
Model Type	A lightweight sentence-transformers model for Danish NLP
Training Data	ELI5 and SQUAD data machine translated from English to Danish, including datasets like squad, eli5, sentence-transformers/embedding-training-data, KennethTM/squad_pairs_danish, KennethTM/eli5_question_answer_danish
Library Name	sentence-transformers
Pipeline Tag	text-ranking

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご