Bert L12 H384 A6
B
Bert L12 H384 A6
Developed by eli4s
A lightweight BERT model pre-trained on the BookCorpus dataset using knowledge distillation technology, with the hidden layer dimension reduced to 384 and 6 attention heads.
Downloads 16
Release Time : 3/2/2022
Model Overview
This model is a lightweight BERT variant pre-trained using knowledge distillation technology, suitable for masked language modeling tasks.
Model Features
Lightweight Design
The hidden layer dimension is reduced to 384 (equivalent to half of BERT), and 6 attention heads are used, keeping the dimension of each head consistent with BERT.
Knowledge Distillation
Pre-trained using knowledge distillation technology and optimized with a multi-loss function.
Random Initialization
The model weights are generated by random initialization.
Model Capabilities
Masked Language Prediction
Text Understanding
Use Cases
Natural Language Processing
Text Completion
Predict the masked words in a sentence.
Multiple candidate words can be generated for selection.
Featured Recommended AI Models