Msmarco Distilbert Word2vec256k MLM 230k
This model is a pre-trained language model based on the DistilBERT architecture, initialized with a 256k vocabulary using word2vec and trained on the MS MARCO corpus with masked language modeling (MLM).
Large Language Model
Transformers