🚀 Model Card for vectorizer.vanilla
This model, vectorizer.vanilla
, developed by Sinequa, is a vectorizer that generates an embedding vector for a given passage or query. Passage vectors are stored in the vector index, and the query vector is used to search for relevant passages in the index during query time.
🚀 Quick Start
This model is ready to generate embedding vectors for passages and queries. You can start using it in your Sinequa environment that meets the requirements mentioned below.
✨ Features
- Multilingual Support: Although trained and tested mainly in English, it has the potential for broader language applications.
- Efficient Inference: Offers fast inference times on various NVIDIA GPUs with different quantization types and batch sizes.
- Low Memory Usage: Consumes relatively low GPU memory, especially with FP16 quantization.
📦 Installation
Requirements
- Sinequa Version:
- Minimal Sinequa version: 11.10.0
- Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
- CUDA Compute Capability: Above 5.0 (above 6.0 for FP16 use). Cuda compute capability
💻 Usage Examples
Basic Usage
The model can be used to generate embedding vectors for passages and queries in your Sinequa application. Here is a high - level concept of how it might be used:
from sinequa_model_api import get_vector
passage = "This is a sample passage."
query = "Find relevant passages."
passage_vector = get_vector(passage)
query_vector = get_vector(query)
📚 Documentation
Supported Languages
The model was trained and tested in the following language:
Scores
Metric |
Value |
Relevance (Recall@100) |
0.639 |
Note that the relevance score is computed as an average over 14 retrieval datasets (see details below).
Inference Times
GPU |
Quantization type |
Batch size 1 |
Batch size 32 |
NVIDIA A10 |
FP16 |
1 ms |
5 ms |
NVIDIA A10 |
FP32 |
2 ms |
20 ms |
NVIDIA T4 |
FP16 |
1 ms |
14 ms |
NVIDIA T4 |
FP32 |
2 ms |
53 ms |
NVIDIA L4 |
FP16 |
1 ms |
5 ms |
NVIDIA L4 |
FP32 |
3 ms |
25 ms |
GPU Memory usage
Quantization type |
Memory |
FP16 |
300 MiB |
FP32 |
500 MiB |
Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.
Model Details
Overview
- Number of parameters: 23 million
- Base language model: English MiniLM - L6 - H384
- Case and accent insensitivity: Insensitive to casing and accents
- Output dimensions: 256 (reduced with an additional dense layer)
- Training procedure: Query - passage - negative triplets for datasets that have mined hard negative data, Query - passage pairs for the rest. Number of negatives is augmented with in - batch negative strategy.
Training Data
The model has been trained using all datasets that are cited in the all - MiniLM - L6 - v2 model.
Evaluation Metrics
To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the [BEIR benchmark](https://github.com/beir - cellar/beir). Note that all these datasets are in English.
Dataset |
Recall@100 |
Average |
0.639 |
|
|
Arguana |
0.969 |
CLIMATE - FEVER |
0.509 |
DBPedia Entity |
0.409 |
FEVER |
0.839 |
FiQA - 2018 |
0.702 |
HotpotQA |
0.609 |
MS MARCO |
0.849 |
NFCorpus |
0.315 |
NQ |
0.786 |
Quora |
0.995 |
SCIDOCS |
0.497 |
SciFact |
0.911 |
TREC - COVID |
0.129 |
Webis - Touche - 2020 |
0.427 |