đ 8-layer distillation from BAAI/bge-m3 with 2.5x speedup
This is an embedding model distilled from BAAI/bge-m3 on a combination of public and proprietary datasets. It offers a 2.5x speedup with little-to-no loss in retrieval performance, featuring an 8 - layer architecture (instead of 24 layers) and a 366m - parameter size.
đ Quick Start
First, install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("altaidevorg/bge-m3-distill-8l")
sentences = [
'That is a happy person',
'That is a happy dog',
'That is a very happy person',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
⨠Features
- High - Speed Inference: Achieves a 2.5x throughput increase (454 texts / sec instead of 175 texts / sec, measured on a T4 Colab GPU).
- Retrieval Performance: Maintains high retrieval performance with a Spearman Cosine score of 0.965 and MSE of 0.006 in the test subset.
- Multilingual Capability: Shows good performance in multiple languages, e.g., a Spearman Cosine score of 0.938 in a collection of 10k English texts.
đ Documentation
Motivation
We are a team with experience in developing real - world semantic search and RAG use cases. BAAI/bge-m3
is useful across various domains and use cases, especially in multimodal settings. However, its large size makes it costly to serve large user groups with low latency and index large volumes of data. So, we aimed to achieve similar retrieval performance with a smaller model and higher speed. We created a 10m - text dataset and applied knowledge distillation to reduce the number of layers from 24 to 8. The results were promising, and we also observed a 2.5x throughput increase.
Future Work
Our model shows good performance in multiple languages even though the training dataset mainly consists of Turkish texts. We plan to develop a second - version distillation model trained on a larger and multilingual dataset and an even smaller distillation. Stay tuned for updates and feel free to contact us for collaboration.
Model Details
Model Description
Property |
Details |
Model Type |
Sentence Transformer |
Base model |
BAAI/bge-m3 |
Maximum Sequence Length |
8192 tokens |
Output Dimensionality |
1024 dimensions |
Similarity Function |
Cosine Similarity |
Training Dataset |
10m texts from diverse domains |
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Evaluation
Metrics
Semantic Similarity
Metric |
sts-dev |
sts-test |
pearson_cosine |
0.9691 |
0.9691 |
spearman_cosine |
0.965 |
0.9651 |
Knowledge Distillation
Metric |
Value |
negative_mse |
-0.0064 |
Training Details
Training Dataset
- Size: 9,623,924 training samples
- Columns:
sentence
and label
- Approximate statistics based on the first 1000 samples:
|
sentence |
label |
type |
string |
list |
details |
- min: 5 tokens
- mean: 55.78 tokens
- max: 468 tokens
|
|
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MSELoss
@inproceedings{reimers-2020-multilingual-sentence-bert,
title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2020",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2004.09813",
}
bge-m3
@misc{bge-m3,
title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
year={2024},
eprint={2402.03216},
archivePrefix={arXiv},
primaryClass={cs.CL}
}