vectorizer.vanilla Open-source Vectorizer - Freely Generate Embedding Vectors to Aid Sentence Similarity Calculation and Retrieval

Vectorizer.vanilla

Developed by sinequa

A vectorizer developed by Sinequa that generates embedding vectors from input paragraphs or queries for sentence similarity computation and retrieval tasks.

Text Embedding

Transformers

English#English sentence embedding #Low-latency inference #Retrieval-augmented

Downloads 634

Release Time : 7/11/2023

Model Overview

This model is a feature extraction model specifically designed to generate text embeddings. Paragraph vectors can be stored in vector indexes, while query vectors are used to retrieve relevant paragraphs.

Model Features

Efficient inference

Achieves millisecond-level inference on various GPUs, with peak batch processing speeds of up to 5ms (FP16 quantization)

Multi-GPU support

Supports various NVIDIA GPUs including A10/T4/L4, with FP16/FP32 quantization options

Robust training

Trained using query-paragraph-negative triplets and in-batch negative sampling strategy to enhance model discrimination

Model Capabilities

Text vectorization

Sentence similarity computation

Semantic retrieval

Use Cases

Information retrieval

Document retrieval system

Vectorize document paragraphs for storage and quickly retrieve relevant content via query vectors

Achieved average Recall@100 of 0.639 across 14 datasets in BEIR benchmark

Q&A systems

FAQ matching

Match user questions with knowledge base questions via vector similarity

Achieved Recall@100 of 0.995 on Quora dataset

🚀 Model Card for `vectorizer.vanilla`

This model, vectorizer.vanilla, developed by Sinequa, is a vectorizer that generates an embedding vector for a given passage or query. Passage vectors are stored in the vector index, and the query vector is used to search for relevant passages in the index during query time.

🚀 Quick Start

This model is ready to generate embedding vectors for passages and queries. You can start using it in your Sinequa environment that meets the requirements mentioned below.

✨ Features

Multilingual Support: Although trained and tested mainly in English, it has the potential for broader language applications.
Efficient Inference: Offers fast inference times on various NVIDIA GPUs with different quantization types and batch sizes.
Low Memory Usage: Consumes relatively low GPU memory, especially with FP16 quantization.

📦 Installation

Requirements

Sinequa Version:
- Minimal Sinequa version: 11.10.0
- Minimal Sinequa version for using FP16 models and GPUs with CUDA compute capability of 8.9+ (like NVIDIA L4): 11.11.0
CUDA Compute Capability: Above 5.0 (above 6.0 for FP16 use). Cuda compute capability

💻 Usage Examples

Basic Usage

The model can be used to generate embedding vectors for passages and queries in your Sinequa application. Here is a high - level concept of how it might be used:

# This is a conceptual example, actual implementation may vary
# Assume there is a function to interact with the model
from sinequa_model_api import get_vector

passage = "This is a sample passage."
query = "Find relevant passages."

passage_vector = get_vector(passage)
query_vector = get_vector(query)

# Use the vectors to search in the index
# search_index(query_vector)

📚 Documentation

Supported Languages

The model was trained and tested in the following language:

English

Scores

Metric	Value
Relevance (Recall@100)	0.639

Note that the relevance score is computed as an average over 14 retrieval datasets (see details below).

Inference Times

GPU	Quantization type	Batch size 1	Batch size 32
NVIDIA A10	FP16	1 ms	5 ms
NVIDIA A10	FP32	2 ms	20 ms
NVIDIA T4	FP16	1 ms	14 ms
NVIDIA T4	FP32	2 ms	53 ms
NVIDIA L4	FP16	1 ms	5 ms
NVIDIA L4	FP32	3 ms	25 ms

GPU Memory usage

Quantization type	Memory
FP16	300 MiB
FP32	500 MiB

Note that GPU memory usage only includes how much GPU memory the actual model consumes on an NVIDIA T4 GPU with a batch size of 32. It does not include the fix amount of memory that is consumed by the ONNX Runtime upon initialization which can be around 0.5 to 1 GiB depending on the used GPU.

Model Details

Overview

Number of parameters: 23 million
Base language model: English MiniLM - L6 - H384
Case and accent insensitivity: Insensitive to casing and accents
Output dimensions: 256 (reduced with an additional dense layer)
Training procedure: Query - passage - negative triplets for datasets that have mined hard negative data, Query - passage pairs for the rest. Number of negatives is augmented with in - batch negative strategy.

Training Data

The model has been trained using all datasets that are cited in the all - MiniLM - L6 - v2 model.

Evaluation Metrics

To determine the relevance score, we averaged the results that we obtained when evaluating on the datasets of the [BEIR benchmark](https://github.com/beir - cellar/beir). Note that all these datasets are in English.

Dataset	Recall@100
Average	0.639

Arguana	0.969
CLIMATE - FEVER	0.509
DBPedia Entity	0.409
FEVER	0.839
FiQA - 2018	0.702
HotpotQA	0.609
MS MARCO	0.849
NFCorpus	0.315
NQ	0.786
Quora	0.995
SCIDOCS	0.497
SciFact	0.911
TREC - COVID	0.129
Webis - Touche - 2020	0.427

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご