DistilBERT - Dot - TAS_B - B256 - MSMarco Open - Source Model: Empowering Dense Retrieval and Candidate Set Re

Distilbert Dot Tas B B256 Msmarco

Developed by sebastian-hofstaetter

A dual-encoder dot-product scoring architecture based on DistilBert, trained on the MSMARCO-Passage dataset with balanced topic-aware sampling, suitable for dense retrieval and candidate set re-ranking

Text Embedding

Transformers

English#Dense Passage Retrieval #Knowledge Distillation Optimization #Efficient Training

Downloads 3,188

Release Time : 3/2/2022

Model Overview

This model is an efficient dense passage retrieval system that excels in information retrieval tasks through knowledge distillation and topic-aware sampling training

Model Features

Balanced Topic-Aware Sampling

Utilizes the innovative TAS-B training method to optimize the sampling distribution of training data

Efficient Training

Training can be completed in just 48 hours on a single consumer-grade GPU

Dual Supervision Mechanism

Combines BERT_CAT pairwise scores and in-batch negative signals provided by the ColBERT model

Shared Encoding Architecture

Query and passage encoding share the same BERT layers, improving efficiency and reducing memory requirements

Model Capabilities

Dense Passage Retrieval

Candidate Set Re-ranking

Semantic Similarity Calculation

Use Cases

Information Retrieval

Search Engine Result Re-ranking

Semantic re-ranking of results returned by traditional retrieval systems

Achieves MRR@10 of 0.347 on MSMARCO-DEV

End-to-End Dense Retrieval

Directly used in vector index-based dense retrieval systems

Achieves Recall@1K of 0.843 on TREC-DL'19

🚀 DistilBert for Dense Passage Retrieval trained with Balanced Topic Aware Sampling (TAS-B)

This project offers a retrieval-trained DistilBert-based model (we term the dual-encoder then dot-product scoring architecture BERT_Dot). It is trained with Balanced Topic Aware Sampling on MSMARCO-Passage. This model can be used for re-ranking a candidate set or direct dense retrieval based on a vector index.

✨ Features

Training Configuration: Trained with a batch size of 256.
Model Architecture: A 6-layer DistilBERT without any architectural additions or modifications (only the weights are changed during training). The CLS vector is pooled to obtain query/passage representations. The same BERT layers are used for both query and passage encoding, which yields better results and reduces memory requirements.
Efficient Training: The batch composition procedure and dual supervision for dense retrieval training are efficient and can be completed on a single consumer GPU in 48 hours.

📚 Documentation

Effectiveness on MSMARCO Passage & TREC-DL'19

The model is trained on the MSMARCO standard ("small"-400K query) training triples re-sampled using the TAS-B method. The BERT_CAT pairwise scores and the ColBERT model are used as teacher models for in-batch-negative signals.

MSMARCO-DEV (7K)

	MRR@10	NDCG@10	Recall@1K
BM25	.194	.241	.857
TAS-B BERT_Dot (Retrieval)	.347	.410	.978

TREC-DL'19

For MRR and Recall, the recommended binarization point of the graded relevance of 2 is used, which might affect the results compared to other binarization points.

	MRR@10	NDCG@10	Recall@1K
BM25	.689	.501	.739
TAS-B BERT_Dot (Retrieval)	.883	.717	.843

TREC-DL'20

Similarly, for MRR and Recall, the recommended binarization point of the graded relevance of 2 is used.

	MRR@10	NDCG@10	Recall@1K
BM25	.649	.475	.806
TAS-B BERT_Dot (Retrieval)	.843	.686	.875

For more baselines, information, and analysis, please refer to the paper: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

Limitations & Bias

Social Biases: The model inherits social biases from both DistilBERT and MSMARCO.
Text Length Limitation: Trained only on relatively short passages of MSMARCO (average 60 words in length), so it may have difficulties with longer text.

Citation

If you use our model checkpoint, please cite our work as follows:

@inproceedings{Hofstaetter2021_tasb_dense_retrieval,
 author = {Sebastian Hofst{\"a}tter and Sheng-Chieh Lin and Jheng-Hong Yang and Jimmy Lin and Allan Hanbury},
 title = {{Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling}},
 booktitle = {Proc. of SIGIR},
 year = {2021},
}

🚀 Quick Start

For more information and a minimal usage example, please visit: tas-balanced-dense-retrieval

💡 Usage Tip

If you want to know more about our efficient batch composition procedure and dual supervision for dense retrieval training, check out our paper: Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling 🎉

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご