Distilbert Tokenizer 256k MLM 750k
DistilBERT model based on 256k word embeddings, optimized through MLM training
Downloads 16
Release Time : 4/7/2022
Model Overview
This model combines Word2Vec pretrained word embeddings with the DistilBERT architecture, trained on multiple corpora using MLM, suitable for natural language processing tasks
Model Features
Large-scale Word Embeddings
Initially loaded with 256k-dimensional Word2Vec embeddings, covering extensive vocabulary
Optimized MLM Training
Updated word embedding parameters during 750k training steps to enhance contextual representation
Multi-source Training Data
Trained on 100GB corpus including C4, MSMARCO, news, Wikipedia, and S2ORC
Lightweight Architecture
Utilizes DistilBERT architecture to reduce model size while maintaining performance
Model Capabilities
Text Representation Learning
Contextual Word Vector Generation
Language Model Fine-tuning
Transfer Learning for Downstream NLP Tasks
Use Cases
Information Retrieval
Search Relevance Modeling
Improving query-document matching using optimized word vectors
Text Classification
Document Classification
Feature extraction based on rich word embeddings
Featured Recommended AI Models