Electra Small Generator
ELECTRA is an efficient text encoder that achieves excellent performance with lower computational power through discriminative pretraining rather than generative pretraining
Downloads 11.07k
Release Time : 3/2/2022
Model Overview
The ELECTRA model adopts the concept of generative adversarial networks, pretraining by discriminating real/generated tokens. This generator model is used to produce fake tokens for discriminator training, but note its scale should maintain a 1:4 ratio with the discriminator to avoid training instability
Model Features
Efficient Pretraining
Compared to traditional generative pretraining, discriminative training improves computational efficiency
Adversarial Training Mechanism
Uses a GAN-like architecture to optimize the model through generator-discriminator adversarial training
Parameter Efficiency
Small-scale models can achieve near SOTA results on tasks like GLUE/SQuAD
Model Capabilities
Text Encoding
Masked Language Modeling
Downstream Task Fine-tuning
Use Cases
Natural Language Understanding
Text Classification
Fine-tuned on GLUE benchmark for tasks like sentiment analysis
Question Answering
Fine-tuned via SQuAD dataset for machine reading comprehension
Paper reports achieving SOTA on SQuAD 2.0 at the time
Featured Recommended AI Models