ALBERT Large v2 Open-source English Language Model - Available for Free Deployment for English Content Processing

Albert Large V2

Developed by albert

ALBERT Large v2 is a Transformer model pre-trained on English corpora with the masked language modeling (MLM) objective. It features parameter sharing.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Parameter sharing architecture #Masked language modeling #Sentence order prediction

Downloads 6,841

Release Time : 3/2/2022

Model Overview

This model is a self-supervised learning model based on the Transformer architecture, mainly used for natural language processing tasks such as text classification and question-answering systems.

Model Features

Parameter sharing

ALBERT shares parameters across layers in the Transformer, reducing memory usage.

Efficient pre-training

Pre-trained with two objectives: masked language modeling and sentence order prediction to learn deep representations of language.

Version improvement

Version 2 has better performance than Version 1, thanks to different dropout rates, additional training data, and longer training time.

Model Capabilities

Text feature extraction

Masked language modeling

Sentence order prediction

Fine-tuning for downstream tasks

Use Cases

Natural language processing

Text classification

Use features generated by the ALBERT model as input to train a standard classifier.

Question-answering system

Fine-tune on question-answering tasks, such as the SQuAD dataset.

Achieved an F1/EM score of 84.9/81.8 on SQuAD2.0

🚀 ALBERT Large v2

A pre - trained English language model using masked language modeling (MLM) objective, offering strong feature extraction for downstream tasks.

🚀 Quick Start

This ALBERT Large v2 model is a pre - trained English language model. It can be used directly for masked language modeling or next - sentence prediction, but is mainly designed to be fine - tuned for downstream tasks. You can find fine - tuned versions on the model hub.

✨ Features

Bidirectional Representation Learning: Through Masked Language Modeling (MLM), it can learn a bidirectional representation of sentences, different from traditional RNNs and autoregressive models.
Sentence Ordering Prediction: ALBERT uses a pretraining loss based on predicting the ordering of two consecutive text segments.
Layer Sharing: It shares layers across its Transformer, resulting in a small memory footprint.
Improved Version 2: Version 2 has better performance in nearly all downstream tasks due to different dropout rates, additional training data, and longer training.

📦 Installation

The text does not provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

You can use this model directly with a pipeline for masked language modeling:

>>> from transformers import pipeline
>>> unmasker = pipeline('fill - mask', model='albert - large - v2')
>>> unmasker("Hello I'm a [MASK] model.")
[
   {
      "sequence":"[CLS] hello i'm a modeling model.[SEP]",
      "score":0.05816134437918663,
      "token":12807,
      "token_str":"â–modeling"
   },
   {
      "sequence":"[CLS] hello i'm a modelling model.[SEP]",
      "score":0.03748830780386925,
      "token":23089,
      "token_str":"â–modelling"
   },
   {
      "sequence":"[CLS] hello i'm a model model.[SEP]",
      "score":0.033725276589393616,
      "token":1061,
      "token_str":"â–model"
   },
   {
      "sequence":"[CLS] hello i'm a runway model.[SEP]",
      "score":0.017313428223133087,
      "token":8014,
      "token_str":"â–runway"
   },
   {
      "sequence":"[CLS] hello i'm a lingerie model.[SEP]",
      "score":0.014405295252799988,
      "token":29104,
      "token_str":"â–lingerie"
   }
]

Advanced Usage

Get Features in PyTorch

from transformers import AlbertTokenizer, AlbertModel
tokenizer = AlbertTokenizer.from_pretrained('albert - large - v2')
model = AlbertModel.from_pretrained("albert - large - v2")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

Get Features in TensorFlow

from transformers import AlbertTokenizer, TFAlbertModel
tokenizer = AlbertTokenizer.from_pretrained('albert - large - v2')
model = TFAlbertModel.from_pretrained("albert - large - v2")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

📚 Documentation

Model Configuration

This model has the following configuration:

Property	Details
Repeating Layers	24
Embedding Dimension	128
Hidden Dimension	1024
Attention Heads	16
Parameters	17M

Intended Uses & Limitations

This model is primarily aimed at being fine - tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation, you should look at models like GPT2.

Limitations and Bias

Even if the training data is fairly neutral, this model can have biased predictions. For example:

>>> from transformers import pipeline
>>> unmasker = pipeline('fill - mask', model='albert - large - v2')
>>> unmasker("The man worked as a [MASK].")

[
   {
      "sequence":"[CLS] the man worked as a chauffeur.[SEP]",
      "score":0.029577180743217468,
      "token":28744,
      "token_str":"â–chauffeur"
   },
   {
      "sequence":"[CLS] the man worked as a janitor.[SEP]",
      "score":0.028865724802017212,
      "token":29477,
      "token_str":"â–janitor"
   },
   {
      "sequence":"[CLS] the man worked as a shoemaker.[SEP]",
      "score":0.02581118606030941,
      "token":29024,
      "token_str":"â–shoemaker"
   },
   {
      "sequence":"[CLS] the man worked as a blacksmith.[SEP]",
      "score":0.01849772222340107,
      "token":21238,
      "token_str":"â–blacksmith"
   },
   {
      "sequence":"[CLS] the man worked as a lawyer.[SEP]",
      "score":0.01820771023631096,
      "token":3672,
      "token_str":"â–lawyer"
   }
]

>>> unmasker("The woman worked as a [MASK].")

[
   {
      "sequence":"[CLS] the woman worked as a receptionist.[SEP]",
      "score":0.04604868218302727,
      "token":25331,
      "token_str":"â–receptionist"
   },
   {
      "sequence":"[CLS] the woman worked as a janitor.[SEP]",
      "score":0.028220869600772858,
      "token":29477,
      "token_str":"â–janitor"
   },
   {
      "sequence":"[CLS] the woman worked as a paramedic.[SEP]",
      "score":0.0261906236410141,
      "token":23386,
      "token_str":"â–paramedic"
   },
   {
      "sequence":"[CLS] the woman worked as a chauffeur.[SEP]",
      "score":0.024797942489385605,
      "token":28744,
      "token_str":"â–chauffeur"
   },
   {
      "sequence":"[CLS] the woman worked as a waitress.[SEP]",
      "score":0.024124596267938614,
      "token":13678,
      "token_str":"â–waitress"
   }
]

This bias will also affect all fine - tuned versions of this model.

Training Data

The ALBERT model was pretrained on BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers).

Training Procedure

Preprocessing

The texts are lowercased and tokenized using SentencePiece and a vocabulary size of 30,000. The inputs of the model are then of the form:

[CLS] Sentence A [SEP] Sentence B [SEP]

Training

The ALBERT procedure follows the BERT setup. The details of the masking procedure for each sentence are the following:

15% of the tokens are masked.
In 80% of the cases, the masked tokens are replaced by [MASK].
In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
In the 10% remaining cases, the masked tokens are left as is.

Evaluation Results

When fine - tuned on downstream tasks, the ALBERT models achieve the following results:

	Average	SQuAD1.1	SQuAD2.0	MNLI	SST - 2	RACE
V2
ALBERT - base	82.3	90.2/83.2	82.1/79.3	84.6	92.9	66.8
ALBERT - large	85.7	91.8/85.2	84.9/81.8	86.5	94.9	75.2
ALBERT - xlarge	87.9	92.9/86.4	87.9/84.1	87.9	95.4	80.7
ALBERT - xxlarge	90.9	94.6/89.1	89.8/86.9	90.6	96.8	86.8
V1
ALBERT - base	80.1	89.3/82.3	80.0/77.1	81.6	90.3	64.0
ALBERT - large	82.4	90.6/83.9	82.3/79.4	83.5	91.7	68.5
ALBERT - xlarge	85.5	92.5/86.1	86.1/83.1	86.4	92.4	74.8
ALBERT - xxlarge	91.0	94.8/89.3	90.2/87.4	90.8	96.9	86.5

🔧 Technical Details

The text does not provide more in - depth technical details, so this section is skipped.

📄 License

The model is released under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご