BGE-M3-Distill-8L Open-Source Embedding Model - Distillation Optimization, 2.5 Times Faster Retrieval Without Performance Loss

Bge M3 Distill 8l

Developed by altaidevorg

An 8-layer embedding model distilled from BAAI/bge-m3, achieving 2.5x speed improvement while maintaining retrieval performance

Text Embedding

Safetensors

#Efficient Semantic Retrieval #Multilingual Embedding #Knowledge Distillation Optimization

Downloads 249

Release Time : 1/19/2025

Model Overview

This model compresses the original 24-layer model to 8 layers through knowledge distillation, with 366 million parameters, suitable for semantic similarity calculation and retrieval tasks

Model Features

Efficient Compression

Distilled from 24 to 8 layers, reducing parameters by 67% and improving inference speed by 2.5x

Performance Retention

Maintains a Spearman cosine similarity of 0.965 on STS test sets, with negligible difference from the original model

Long Text Support

Supports sequences up to 8192 tokens, suitable for long document processing

Cross-language Capability

While primarily trained on Turkish data, it performs excellently in English and other languages

Model Capabilities

Semantic Similarity Calculation

Text Embedding Generation

Cross-language Text Retrieval

Long Text Processing

Use Cases

Information Retrieval

Semantic Search System

Building a search engine based on semantic matching

Improves relevance of search results

Recommendation System

Content Recommendation

Recommendation engine based on content similarity

Increases recommendation accuracy

RAG Applications

Retrieval-Augmented Generation

Providing relevant context retrieval for LLMs

Enhances relevance of generated content

🚀 8-layer distillation from BAAI/bge-m3 with 2.5x speedup

This is an embedding model distilled from BAAI/bge-m3 on a combination of public and proprietary datasets. It offers a 2.5x speedup with little-to-no loss in retrieval performance, featuring an 8 - layer architecture (instead of 24 layers) and a 366m - parameter size.

🚀 Quick Start

First, install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference:

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("altaidevorg/bge-m3-distill-8l")
# Run inference
sentences = [
    'That is a happy person',
    'That is a happy dog',
    'That is a very happy person',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

✨ Features

High - Speed Inference: Achieves a 2.5x throughput increase (454 texts / sec instead of 175 texts / sec, measured on a T4 Colab GPU).
Retrieval Performance: Maintains high retrieval performance with a Spearman Cosine score of 0.965 and MSE of 0.006 in the test subset.
Multilingual Capability: Shows good performance in multiple languages, e.g., a Spearman Cosine score of 0.938 in a collection of 10k English texts.

📚 Documentation

Motivation

We are a team with experience in developing real - world semantic search and RAG use cases. BAAI/bge-m3 is useful across various domains and use cases, especially in multimodal settings. However, its large size makes it costly to serve large user groups with low latency and index large volumes of data. So, we aimed to achieve similar retrieval performance with a smaller model and higher speed. We created a 10m - text dataset and applied knowledge distillation to reduce the number of layers from 24 to 8. The results were promising, and we also observed a 2.5x throughput increase.

Future Work

Our model shows good performance in multiple languages even though the training dataset mainly consists of Turkish texts. We plan to develop a second - version distillation model trained on a larger and multilingual dataset and an even smaller distillation. Stay tuned for updates and feel free to contact us for collaboration.

Model Details

Model Description

Property	Details
Model Type	Sentence Transformer
Base model	BAAI/bge-m3
Maximum Sequence Length	8192 tokens
Output Dimensionality	1024 dimensions
Similarity Function	Cosine Similarity
Training Dataset	10m texts from diverse domains

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Evaluation

Metrics

Semantic Similarity

Datasets: sts-dev and sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	sts-dev	sts-test
pearson_cosine	0.9691	0.9691
spearman_cosine	0.965	0.9651

Knowledge Distillation

Evaluated with MSEEvaluator

Metric	Value
negative_mse	-0.0064

Training Details

Training Dataset

Size: 9,623,924 training samples
Columns: sentence and label
Approximate statistics based on the first 1000 samples:
sentence label
type string list
details
min: 5 tokens
mean: 55.78 tokens
max: 468 tokens
size: 1024 elements

	sentence	label
type	string	list
details	min: 5 tokens mean: 55.78 tokens max: 468 tokens	size: 1024 elements

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

bge-m3

@misc{bge-m3,
      title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, 
      author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
      year={2024},
      eprint={2402.03216},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご