BGE-M3-ONNX Open-Source Embedding Model - Supports Dense Retrieval, Lexical Matching, and Easily Compatible with Frameworks

Bge M3 Onnx

Developed by aapot

BGE-M3 is an embedding model that supports dense retrieval, lexical matching, and multi-vector interaction, converted to ONNX format for compatibility with frameworks like ONNX Runtime.

Text Embedding

Transformers

Open Source License:MIT #Multimodal Embedding #Hybrid Dense-Sparse Retrieval #ONNX Optimization

Downloads 292

Release Time : 2/16/2024

Model Overview

BGE-M3 is a versatile embedding model capable of simultaneously outputting dense, sparse, and ColBERT vector representations, suitable for various information retrieval tasks.

Model Features

Multi-vector Representation

Supports dense, sparse, and ColBERT vector representations simultaneously

ONNX Compatibility

Converted to ONNX format, supporting multiple frameworks like ONNX Runtime

Optimization Support

Provides different levels of graph optimization options to choose from based on needs

Normalization Processing

Default normalization for dense and ColBERT vectors

Model Capabilities

Dense Vector Retrieval

Lexical Matching

Multi-vector Interaction

Text Embedding Generation

Use Cases

Information Retrieval

Document Retrieval

Utilizes dense vector representations for semantic similarity retrieval

Can retrieve documents semantically related to the query

Keyword Matching

Uses sparse vector representations for precise lexical matching

Can identify documents containing specific keywords

Question Answering Systems

Answer Retrieval

Combines multiple vector representations to find the most relevant answers

Improves the accuracy and recall rate of QA systems

🚀 BGE-M3 ONNX Model

The BGE-M3 model converted to ONNX weights using HF Optimum, ensuring compatibility with tools like ONNX Runtime.

This ONNX model simultaneously outputs dense, sparse, and ColBERT embedding representations. The output is a list of numpy arrays in the order of the representations mentioned above.

⚠️ Important Note

Dense and ColBERT embeddings are normalized, following the default behavior in the original FlagEmbedding library. If you need unnormalized outputs, modify the code in bgem3_model.py and re - run the ONNX export using the export_onnx.py script.

This ONNX model also has "O2" level graph optimizations applied. You can find more information about optimization levels here. If you want an ONNX model with different optimizations or no optimizations, re - run the ONNX export script export_onnx.py with the appropriate optimization argument.

🚀 Quick Start

✨ Features

Outputs dense, sparse, and ColBERT embedding representations simultaneously.
Supports "O2" level graph optimizations.

📦 Installation

If you haven't already, install the ONNX Runtime Python library with pip:

pip install onnxruntime==1.17.0

For tokenization, install HF Transformers with pip:

pip install transformers==4.37.2

Clone this repository with Git LFS to obtain the ONNX model files.

💻 Usage Examples

Basic Usage

You can use the model to compute embeddings as follows:

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-m3")
ort_session = ort.InferenceSession("model.onnx")

inputs = tokenizer("BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.", padding="longest", return_tensors="np")
inputs_onnx = {k: ort.OrtValue.ortvalue_from_numpy(v) for k, v in inputs.items()}

outputs = ort_session.run(None, inputs_onnx)

Advanced Usage

You can use the following sparse token weight processor from FlagEmbedding to get the same output for the sparse representation from the ONNX model:

from collections import defaultdict


def process_token_weights(token_weights: np.ndarray, input_ids: list):
    # conver to dict
    result = defaultdict(int)
    unused_tokens = set(
        [
            tokenizer.cls_token_id,
            tokenizer.eos_token_id,
            tokenizer.pad_token_id,
            tokenizer.unk_token_id,
        ]
    )
    for w, idx in zip(token_weights, input_ids):
        if idx not in unused_tokens and w > 0:
            idx = str(idx)
            # w = int(w)
            if w > result[idx]:
                result[idx] = w
    return result


token_weights = outputs[1].squeeze(-1)
lexical_weights = list(
    map(process_token_weights, token_weights, inputs["input_ids"].tolist())
)

📦 Export ONNX weights

You can export ONNX weights using the provided custom BGE - M3 PyTorch model bgem3_model.py file and the export_onnx.py ONNX weight export script, which leverages HF Optimum. If necessary, modify the bgem3_model.py model configuration, for example, to remove embedding normalization or to change the number of output representations. If you modify the number of output representations, also modify the ONNX output config BGEM3OnnxConfig in export_onnx.py.

First, install the required Python packages:

pip install -r requirements.txt

Then, export ONNX weights:

python export_onnx.py --output . --opset 17 --device cpu --optimize O2

You can read more about the optional optimization levels here

📄 License

This project is licensed under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご