D

Dense Encoder Msmarco Distilbert Word2vec256k Emb Updated

Developed by vocab-transformers
A sentence embedding model based on the DistilBERT architecture, initialized with a 256k vocabulary and word2vec, trained on the MS MARCO dataset, suitable for sentence similarity computation and semantic search tasks.
Downloads 31
Release Time : 3/2/2022

Model Overview

This model is a sentence embedding model that converts text into 768-dimensional dense vectors, primarily used for tasks such as sentence similarity computation, semantic search, and information retrieval.

Model Features

Word2Vec Initialization
Initialized with a 256k vocabulary using word2vec, improving the quality of word embeddings.
Efficient Architecture
Based on the DistilBERT architecture, reducing model size while maintaining performance.
Specialized Training
Trained on the MS MARCO dataset using MarginMSELoss, optimizing performance for retrieval tasks.

Model Capabilities

Sentence Embedding Generation
Semantic Similarity Computation
Information Retrieval
Text Clustering

Use Cases

Information Retrieval
Document Retrieval System
Building a document retrieval system based on semantic similarity.
Achieved MRR@10 of 34.51 on the MS MARCO dataset.
Question Answering System
Question-Answer Matching
Used for matching questions and answers in a question-answering system.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase