distilbert-dot-margin_mse-T2-msmarco Open Source Model - Empowering Paragraph Reordering and Direct Retrieval Tasks

Distilbert Dot Margin Mse T2 Msmarco

Developed by sebastian-hofstaetter

DistilBERT-based dense retrieval model trained with knowledge distillation, suitable for passage re-ranking and direct retrieval tasks

Text Embedding

Transformers

English#Dense Passage Retrieval #Knowledge Distillation Optimization #MSMARCO Adaptation

Downloads 99

Release Time : 3/2/2022

Model Overview

This model adopts a 6-layer DistilBERT architecture, trained on the MSMARCO-Passage dataset using the Margin-MSE method, with shared query and passage encoding layers, and employs CLS vector pooling for representation.

Model Features

Knowledge Distillation Training

Utilizes an ensemble of 3 BERT_Cat teacher models for efficient knowledge distillation via the Margin-MSE method

Shared Encoding Architecture

Queries and passages share the same BERT layers, improving performance while reducing memory requirements

Lightweight Design

Based on 6-layer DistilBERT, suitable for deployment on consumer-grade GPUs

Model Capabilities

Passage Retrieval

Candidate Set Re-ranking

Semantic Similarity Calculation

Use Cases

Information Retrieval

Search Engine Result Re-ranking

Re-ranks top-1000 results from traditional retrieval methods like BM25

Achieves MRR@10 of 0.332 on MSMARCO-DEV

Direct Dense Retrieval

Direct passage retrieval based on vector indexing

Achieves Recall@1K of 0.957 on MSMARCO-DEV

🚀 Margin-MSE Trained DistilBert for Dense Passage Retrieval

We offer a retrieval-trained DistilBert-based model (we name the architecture BERT_Dot). This model is trained with Margin-MSE using a 3 teacher BERT_Cat (concatenated BERT scoring) ensemble on MSMARCO-Passage.

This model instance can be utilized to re-rank a candidate set or directly for a vector index based dense retrieval. The architecture is a 6-layer DistilBERT, without any architecture additions or modifications (we only change the weights during training). To obtain a query/passage representation, we pool the CLS vector. We employ the same BERT layers for both query and passage encoding, which yields better results and reduces memory requirements.

If you're interested in learning more about our simple yet effective knowledge distillation method for efficient information retrieval models across various student architectures used in this model instance, check out our paper: https://arxiv.org/abs/2010.02666 🎉

For more details, training data, source code, and a minimal usage example, please visit: https://github.com/sebastian-hofstaetter/neural-ranking-kd

✨ Features

Effectiveness on MSMARCO Passage & TREC-DL'19

We trained our model on the MSMARCO standard ("small"-400K query) training triples with knowledge distillation, using a batch size of 32 on a single consumer-grade GPU (11GB memory).

For re-ranking, we used the top-1000 BM25 results.

MSMARCO-DEV

Property	BM25	Margin-MSE BERT_Dot (Re-ranking)	Margin-MSE BERT_Dot (Retrieval)
MRR@10	.194	.332	.323
NDCG@10	.241	.391	.381
Recall@1K	.868	.868 (from BM25 candidates)	.957

TREC-DL'19

For MRR and Recall, we use the recommended binarization point of the graded relevance of 2. This might skew the results when compared to other binarization point numbers.

Property	BM25	Margin-MSE BERT_Dot (Re-ranking)	Margin-MSE BERT_Dot (Retrieval)
MRR@10	.689	.862	.868
NDCG@10	.501	.712	.697
Recall@1K	.739	.739 (from BM25 candidates)	.769

For more baselines, information, and analysis, please see the paper: https://arxiv.org/abs/2010.02666

Limitations & Bias

The model inherits social biases from both DistilBERT and MSMARCO.
The model is only trained on relatively short passages of MSMARCO (avg. 60 words length), so it might struggle with longer text.

📄 License

Citation

If you use our model checkpoint, please cite our work as:

@misc{hofstaetter2020_crossarchitecture_kd,
      title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation}, 
      author={Sebastian Hofst{\"a}tter and Sophia Althammer and Michael Schr{\"o}der and Mete Sertkan and Allan Hanbury},
      year={2020},
      eprint={2010.02666},
      archivePrefix={arXiv},
      primaryClass={cs.IR}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご