ms - marco - MiniLM - L2 - v2 Open-source Model - Boosting Paragraph Relevance Scoring for Information Retrieval Queries

Ms Marco MiniLM L2 V2

Developed by cross-encoder

A cross-encoder model trained on the MS Marco passage ranking task for query-passage relevance scoring in information retrieval.

Text Embedding EnglishOpen Source License:Apache-2.0 #Information Retrieval Ranking #High-Precision Reranking #English Semantic Matching

Downloads 533.42k

Release Time : 3/2/2022

Model Overview

This model is specifically designed for information retrieval tasks, capable of scoring the relevance between queries and passages, suitable for the reranking stage in search engines.

Model Features

Efficient Reranking

Optimized for the reranking stage in information retrieval, capable of quickly assessing query-passage relevance.

Multiple Size Options

Offers model variants ranging from TinyBERT to MiniLM-L12 to meet different performance requirements.

High Performance

Demonstrates excellent performance on the TREC Deep Learning 2019 and MS Marco passage reranking datasets.

Model Capabilities

Query-Passage Relevance Scoring

Information Retrieval Result Reranking

Use Cases

Search Engine Optimization

Search Result Reranking

Rerank preliminary retrieval results by relevance to improve search result quality

Achieved MRR@10 of 39.02 on the MS Marco development set

Question Answering Systems

Answer Passage Filtering

Filter the most relevant results from candidate answer passages

🚀 Cross-Encoder for MS Marco

This model is designed for the MS Marco Passage Ranking task, offering effective solutions for information retrieval.

🚀 Quick Start

This model was trained on the MS Marco Passage Ranking task.

The model can be used for Information Retrieval: Given a query, encode the query with all possible passages (e.g., retrieved with ElasticSearch). Then sort the passages in a decreasing order. See SBERT.net Retrieve & Re-rank for more details. The training code is available here: SBERT.net Training MS Marco

✨ Features

Trained on MS Marco: Specifically designed for the MS Marco Passage Ranking task.
Useful for Information Retrieval: Can effectively sort passages based on a given query.

📦 Installation

No specific installation steps are provided in the original README. However, to use the model, you need to install relevant libraries such as sentence-transformers or transformers.

💻 Usage Examples

Basic Usage

Usage with SentenceTransformers

The usage is easy when you have SentenceTransformers installed. Then you can use the pre-trained models like this:

from sentence_transformers import CrossEncoder

model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L2-v2')
scores = model.predict([
    ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
    ("How many people live in Berlin?", "Berlin is well known for its museums."),
])
print(scores)
# [ 8.510401 -4.860082]

Usage with Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L2-v2')
tokenizer = AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L2-v2')

features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")

model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)

📚 Documentation

In the following table, we provide various pre-trained Cross-Encoders together with their performance on the TREC Deep Learning 2019 and the MS Marco Passage Reranking dataset.

Property	Details
Model Type	Cross-Encoder
Training Data	sentence-transformers/msmarco
Base Model	cross-encoder/ms-marco-MiniLM-L12-v2
Pipeline Tag	text-ranking
Library Name	sentence-transformers
Tags	transformers

Model-Name	NDCG@10 (TREC DL 19)	MRR@10 (MS Marco Dev)	Docs / Sec
Version 2 models
cross-encoder/ms-marco-TinyBERT-L2-v2	69.84	32.56	9000
cross-encoder/ms-marco-MiniLM-L2-v2	71.01	34.85	4100
cross-encoder/ms-marco-MiniLM-L4-v2	73.04	37.70	2500
cross-encoder/ms-marco-MiniLM-L6-v2	74.30	39.01	1800
cross-encoder/ms-marco-MiniLM-L12-v2	74.31	39.02	960
Version 1 models
cross-encoder/ms-marco-TinyBERT-L2	67.43	30.15	9000
cross-encoder/ms-marco-TinyBERT-L4	68.09	34.50	2900
cross-encoder/ms-marco-TinyBERT-L6	69.57	36.13	680
cross-encoder/ms-marco-electra-base	71.99	36.41	340
Other models
nboost/pt-tinybert-msmarco	63.63	28.80	2900
nboost/pt-bert-base-uncased-msmarco	70.94	34.75	340
nboost/pt-bert-large-msmarco	73.36	36.48	100
Capreolus/electra-base-msmarco	71.23	36.89	340
amberoad/bert-multilingual-passage-reranking-msmarco	68.40	35.54	330
sebastian-hofstaetter/distilbert-cat-margin_mse-T2-msmarco	72.82	37.88	720

Note: Runtime was computed on a V100 GPU.

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご