electra-small-generator Open-source Text Encoder - Achieve Excellent Text Processing Performance with Low Computing Power

Electra Small Generator

Developed by google

ELECTRA is an efficient text encoder that achieves excellent performance with lower computational power through discriminative pretraining rather than generative pretraining

Large Language Model EnglishOpen Source License:Apache-2.0 #Text Generation Discrimination #Low-Compute Pretraining #Transformer Architecture

Downloads 11.07k

Release Time : 3/2/2022

Model Overview

The ELECTRA model adopts the concept of generative adversarial networks, pretraining by discriminating real/generated tokens. This generator model is used to produce fake tokens for discriminator training, but note its scale should maintain a 1:4 ratio with the discriminator to avoid training instability

Model Features

Efficient Pretraining

Compared to traditional generative pretraining, discriminative training improves computational efficiency

Adversarial Training Mechanism

Uses a GAN-like architecture to optimize the model through generator-discriminator adversarial training

Parameter Efficiency

Small-scale models can achieve near SOTA results on tasks like GLUE/SQuAD

Model Capabilities

Text Encoding

Masked Language Modeling

Downstream Task Fine-tuning

Use Cases

Natural Language Understanding

Text Classification

Fine-tuned on GLUE benchmark for tasks like sentiment analysis

Question Answering

Fine-tuned via SQuAD dataset for machine reading comprehension

Paper reports achieving SOTA on SQuAD 2.0 at the time

🚀 ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA is a novel approach for self-supervised language representation learning. It enables pre - training of transformer networks with relatively low computational resources. ELECTRA models are trained to differentiate between "real" and "fake" input tokens, similar to the discriminator in a GAN. It delivers strong results on a single GPU at a small scale and achieves state - of - the - art performance on the SQuAD 2.0 dataset at a large scale.

🚀 Quick Start

This repository offers code for pre - training ELECTRA, including small ELECTRA models on a single GPU. It also supports fine - tuning ELECTRA for downstream tasks such as classification (e.g., GLUE), QA (e.g., SQuAD), and sequence tagging (e.g., text chunking).

✨ Features

Self - supervised learning: ELECTRA is a new self - supervised language representation learning method.
Low compute requirements: It can pre - train transformer networks with relatively little compute.
Versatile performance: Achieves strong results on a single GPU at small scale and state - of - the - art results on SQuAD 2.0 at large scale.
Downstream task support: Supports fine - tuning on various downstream tasks including classification, QA, and sequence tagging.

📄 License

This project is licensed under the Apache 2.0 license.

⚠️ Important Note

This is the official generator checkpoint as in the ELECTRA original codebase. However, this model is not scaled properly for pre - training with google/electra-small-discriminator. The paper recommends a hyperparameter multiplier of 1/4 between the discriminator and generator for this given model to avoid training instabilities. This would not be the case when using google/electra-small-generator and google/electra-small-discriminator, which are similar in size.

💻 Usage Examples

Basic Usage

from transformers import pipeline

fill_mask = pipeline(
    "fill-mask",
    model="google/electra-small-generator",
    tokenizer="google/electra-small-generator"
)

print(
    fill_mask(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks.")
)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご