D

Dense Encoder Msmarco Distilbert Word2vec256k MLM 445k Emb Updated

Developed by vocab-transformers
A sentence embedding model trained on the MS MARCO dataset, using a word2vec-initialized 256k vocabulary and DistilBERT architecture, suitable for semantic search and sentence similarity tasks
Downloads 29
Release Time : 3/2/2022

Model Overview

This model is a sentence embedding model capable of mapping sentences and paragraphs into a 768-dimensional dense vector space, suitable for natural language processing tasks such as clustering and semantic search.

Model Features

Word2Vec Initialized Vocabulary
Uses a 256k vocabulary initialized with word2vec, enhancing the quality of word embeddings
MS MARCO Dataset Training
Trained on the MS MARCO dataset using MarginMSELoss, optimizing semantic search capabilities
High-Performance Sentence Embeddings
Achieved nDCG@10 scores of 66.72 and 69.14 on TREC-DL 2019 and 2020, respectively

Model Capabilities

Sentence Embedding
Semantic Search
Text Clustering
Information Retrieval

Use Cases

Information Retrieval
Document Retrieval System
Build an efficient document retrieval system that matches relevant documents based on query semantics
Achieved an MRR@10 of 34.94 on the MS MARCO development set
Question Answering System
Question Matching
Match similar questions in a question-answering system
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase