B

Bert L12 H384 A6

Developed by eli4s
A lightweight BERT model pre-trained on the BookCorpus dataset using knowledge distillation technology, with the hidden layer dimension reduced to 384 and 6 attention heads.
Downloads 16
Release Time : 3/2/2022

Model Overview

This model is a lightweight BERT variant pre-trained using knowledge distillation technology, suitable for masked language modeling tasks.

Model Features

Lightweight Design
The hidden layer dimension is reduced to 384 (equivalent to half of BERT), and 6 attention heads are used, keeping the dimension of each head consistent with BERT.
Knowledge Distillation
Pre-trained using knowledge distillation technology and optimized with a multi-loss function.
Random Initialization
The model weights are generated by random initialization.

Model Capabilities

Masked Language Prediction
Text Understanding

Use Cases

Natural Language Processing
Text Completion
Predict the masked words in a sentence.
Multiple candidate words can be generated for selection.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase