O

Opensearch Neural Sparse Encoding Doc V2 Distill

Developed by opensearch-project
A sparse retrieval model based on distillation technology, optimized for OpenSearch, supporting inference-free document encoding with improved search relevance and efficiency over V1
Downloads 1.8M
Release Time : 7/17/2024

Model Overview

This model encodes documents into 30522-dimensional sparse vectors, calculating similarity scores via inner product of query/document sparse vectors, suitable for high-efficiency retrieval scenarios

Model Features

Inference-free document encoding
Supports direct document encoding without real-time inference, significantly improving retrieval efficiency
Distillation optimization
Compresses model size through knowledge distillation while maintaining performance and reducing computational resource consumption
Efficient sparse retrieval
Utilizes sparse vector representation and Lucene inverted index for efficient similarity calculation
Multi-dataset training
Incorporates diverse training data like MS MARCO and Q&A pairs to enhance generalization capability

Model Capabilities

Document vectorization encoding
Query sparse vector generation
Semantic similarity calculation
High-efficiency retrieval

Use Cases

Search engines
OpenSearch neural search
Serves as OpenSearch's neural search plugin, providing semantic-based document retrieval capabilities
Achieves average NDCG@10 of 0.504 on BEIR benchmark
Q&A systems
Q&A pair retrieval
Rapidly retrieves answers relevant to user questions from knowledge bases
Achieves NDCG@10 of 0.528 on NQ dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase