đ XLM-R Longformer Model / XLM-Long
XLM-R Longformer (or XLM-Long for short) extends the XLM-R model to support sequence lengths up to 4096 tokens, aiming to create efficient Transformers for low - resource languages.
đ Quick Start
XLM-R Longformer (or XLM-Long) is an extended version of the XLM-R model, which can handle sequence lengths up to 4096 tokens instead of the regular 512. It was pre - trained from the XLM - RoBERTa checkpoint on the English WikiText - 103 corpus using the Longformer pre - training scheme. The motivation behind this is to explore ways to build efficient Transformers for low - resource languages like Swedish without pre - training on long - context datasets in each respective language. This model is the outcome of a master thesis project at Peltarion and has been fine - tuned on multilingual question - answering tasks. The code is available here.
Since both XLM - R and Longformer are large models, it is recommended to run them with NVIDIA Apex (16 - bit precision), a large GPU, and several gradient accumulation steps.
⨠Features
- Extended sequence length support: Allows handling sequences up to 4096 tokens.
- Multilingual application: Fine - tuned on multilingual question - answering tasks.
- Low - resource language exploration: Aims to create efficient models for low - resource languages.
đĻ Installation
There is no specific installation content provided in the original README. So this section is skipped.
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoModel, AutoTokenizer
MAX_SEQUENCE_LENGTH = 4096
MODEL_NAME_OR_PATH = "markussagen/xlm-roberta-longformer-base-4096"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME_OR_PATH,
max_length=MAX_SEQUENCE_LENGTH,
padding="max_length",
truncation=True,
)
model = AutoModelForQuestionAnswering.from_pretrained(
MODEL_NAME_OR_PATH,
max_length=MAX_SEQUENCE_LENGTH,
)
đ Documentation
Training Procedure
The model was trained on the WikiText - 103 corpus using a 48GB GPU. The following training script and parameters were used. The model was pre - trained for 6000 iterations, which took about 5 days. For more information, see the full training script and Github repo.
wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
unzip wikitext-103-raw-v1.zip
export DATA_DIR=./wikitext-103-raw
scripts/run_long_lm.py \
--model_name_or_path xlm-roberta-base \
--model_name xlm-roberta-to-longformer \
--output_dir ./output \
--logging_dir ./logs \
--val_file_path $DATA_DIR/wiki.valid.raw \
--train_file_path $DATA_DIR/wiki.train.raw \
--seed 42 \
--max_pos 4096 \
--adam_epsilon 1e-8 \
--warmup_steps 500 \
--learning_rate 3e-5 \
--weight_decay 0.01 \
--max_steps 6000 \
--evaluate_during_training \
--logging_steps 50 \
--eval_steps 50 \
--save_steps 6000 \
--max_grad_norm 1.0 \
--per_device_eval_batch_size 2 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 64 \
--overwrite_output_dir \
--fp16 \
--do_train \
--do_eval
đ License
The model is released under the Apache - 2.0 license.
Property |
Details |
Tags |
longformer |
Language |
multilingual |
License |
apache - 2.0 |
Datasets |
wikitext |
â ī¸ Important Note
Since both XLM - R model and Longformer models are large models, it is recommended to run the models with NVIDIA Apex (16bit precision), large GPU and several gradient accumulation steps.