đ XLM-R Longformer Model
XLM-R Longformer is an extended XLM-R model that supports sequence lengths up to 4096 tokens, instead of the typical 512. This model was pre - trained from the XLM - RoBERTa checkpoint using the Longformer pre - training scheme on the English WikiText - 103 corpus. The goal was to explore methods for creating efficient Transformers for low - resource languages like Swedish without pre - training on long - context datasets in each respective language. The trained model is the outcome of a master thesis project at Peltarion and was fine - tuned on multilingual question - answering tasks. The code is available [here](https://github.com/MarkusSagen/Master - Thesis - Multilingual - Longformer#xlm - r).
⨠Features
- Extended XLM - R model supporting sequence lengths up to 4096 tokens.
- Pre - trained on the English WikiText - 103 corpus.
- Fine - tuned on multilingual question - answering tasks.
đĻ Installation
The installation details are not provided in the original README. However, to use the model, you need to have the necessary Python libraries installed, such as torch
and transformers
. You can install them using pip
:
pip install torch transformers
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoModel, AutoTokenizer
MAX_SEQUENCE_LENGTH = 4096
MODEL_NAME_OR_PATH = "markussagen/xlm-roberta-longformer-base-4096"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME_OR_PATH,
max_length=MAX_SEQUENCE_LENGTH,
padding="max_length",
truncation=True,
)
model = AutoModelForQuestionAnswering.from_pretrained(
MODEL_NAME_OR_PATH,
max_length=MAX_SEQUENCE_LENGTH,
)
đ Documentation
Training Procedure
The model was trained on the WikiText - 103 corpus using a 48GB GPU with the following training script and parameters. The model was pre - trained for 6000 iterations, which took approximately 5 days. See the full [training script](https://github.com/MarkusSagen/Master - Thesis - Multilingual - Longformer/blob/main/scripts/finetune_qa_models.py) and [Github repo](https://github.com/MarkusSagen/Master - Thesis - Multilingual - Longformer) for more information.
wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
unzip wikitext-103-raw-v1.zip
export DATA_DIR=./wikitext-103-raw
scripts/run_long_lm.py \
--model_name_or_path xlm-roberta-base \
--model_name xlm-roberta-to-longformer \
--output_dir ./output \
--logging_dir ./logs \
--val_file_path $DATA_DIR/wiki.valid.raw \
--train_file_path $DATA_DIR/wiki.train.raw \
--seed 42 \
--max_pos 4096 \
--adam_epsilon 1e-8 \
--warmup_steps 500 \
--learning_rate 3e-5 \
--weight_decay 0.01 \
--max_steps 6000 \
--evaluate_during_training \
--logging_steps 50 \
--eval_steps 50 \
--save_steps 6000 \
--max_grad_norm 1.0 \
--per_device_eval_batch_size 2 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 64 \
--overwrite_output_dir \
--fp16 \
--do_train \
--do_eval
đ§ Technical Details
Since both XLM - R model and Longformer models are large models, it is recommended to run the models with NVIDIA Apex (16 - bit precision), a large GPU, and several gradient accumulation steps.
đ License
This model is released under the Apache - 2.0 license.
Property |
Details |
Model Type |
XLM - R Longformer |
Training Data |
WikiText - 103 |
License |
Apache - 2.0 |
â ī¸ Important Note
Since both XLM - R model and Longformer models are large models, it is recommended to run the models with NVIDIA Apex (16 - bit precision), a large GPU, and several gradient accumulation steps.