V

Vectorizer.guava

Developed by sinequa
A vectorization tool developed by Sinequa that generates embedding vectors from input paragraphs or queries for sentence similarity calculation and retrieval tasks.
Downloads 204
Release Time : 10/9/2024

Model Overview

This model is a multilingual sentence embedding model capable of converting text paragraphs or queries into high-dimensional vectors for storing and retrieving similar content. Supports multiple languages, with special optimization for 11 major languages including English, French, and German.

Model Features

Multilingual support
Specifically trained to support 11 major languages while remaining compatible with the 91 languages in the base model's pretraining
Efficient inference
Requires only 1ms for single query and 5ms for 32 queries in FP16 mode on NVIDIA A10 GPU
Case insensitivity
Insensitive to text case and accents, improving retrieval robustness
Dimensionality reduction
Reduces output dimensions to 256 through additional dense layer, optimizing storage and retrieval efficiency

Model Capabilities

Multilingual text vectorization
Sentence similarity calculation
Semantic retrieval
Cross-language text matching

Use Cases

Information retrieval
Document retrieval system
Build semantic-based document retrieval systems that return the most relevant document paragraphs based on query content
Achieves Recall@100 of 0.616 on English datasets
Multilingual applications
Cross-language content recommendation
Provides content recommendation functionality for multilingual websites, matching similar content across different languages
Achieves Recall@100 of 0.738 on Traditional Chinese msmarco dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase