xlm-roberta-longformer-base-4096 Open Source Model - Supports Long Sequence Processing and Suitable for Multilingual Tasks

Xlm Roberta Longformer Base 4096

Developed by markussagen

A long-sequence processing model based on XLM-R extension, supporting sequences up to 4096 tokens, suitable for multilingual tasks

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Multilingual long-text processing #4096 token length #Q&A system optimization

Downloads 9,499

Release Time : 3/2/2022

Model Overview

This model extends XLM-RoBERTa's sequence processing capability through Longformer pre-training, aiming to provide efficient long-text processing solutions for low-resource languages

Model Features

Ultra-long sequence processing

Supports sequence lengths of 4096 tokens (original XLM-R only supports 512), suitable for processing long documents

Low-resource language optimization

No need for separate pre-training for each language, with special optimization for low-resource languages like Swedish

Efficient training scheme

Uses gradient accumulation (64 steps) and 16-bit precision training to reduce GPU memory requirements

Model Capabilities

Long-text understanding

Multilingual Q&A

Cross-language transfer learning

Use Cases

Q&A systems

Multilingual document Q&A

Handles cross-language Q&A tasks in long documents

Text analysis

Low-resource language processing

Analyzes long texts in low-resource languages like Swedish

🚀 XLM-R Longformer Model / XLM-Long

XLM-R Longformer (or XLM-Long for short) extends the XLM-R model to support sequence lengths up to 4096 tokens, aiming to create efficient Transformers for low - resource languages.

🚀 Quick Start

XLM-R Longformer (or XLM-Long) is an extended version of the XLM-R model, which can handle sequence lengths up to 4096 tokens instead of the regular 512. It was pre - trained from the XLM - RoBERTa checkpoint on the English WikiText - 103 corpus using the Longformer pre - training scheme. The motivation behind this is to explore ways to build efficient Transformers for low - resource languages like Swedish without pre - training on long - context datasets in each respective language. This model is the outcome of a master thesis project at Peltarion and has been fine - tuned on multilingual question - answering tasks. The code is available here.

Since both XLM - R and Longformer are large models, it is recommended to run them with NVIDIA Apex (16 - bit precision), a large GPU, and several gradient accumulation steps.

✨ Features

Extended sequence length support: Allows handling sequences up to 4096 tokens.
Multilingual application: Fine - tuned on multilingual question - answering tasks.
Low - resource language exploration: Aims to create efficient models for low - resource languages.

📦 Installation

There is no specific installation content provided in the original README. So this section is skipped.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModel, AutoTokenizer

MAX_SEQUENCE_LENGTH = 4096
MODEL_NAME_OR_PATH = "markussagen/xlm-roberta-longformer-base-4096"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    max_length=MAX_SEQUENCE_LENGTH,
    padding="max_length",
    truncation=True,
)

model = AutoModelForQuestionAnswering.from_pretrained(
    MODEL_NAME_OR_PATH, 
    max_length=MAX_SEQUENCE_LENGTH,
)

📚 Documentation

Training Procedure

The model was trained on the WikiText - 103 corpus using a 48GB GPU. The following training script and parameters were used. The model was pre - trained for 6000 iterations, which took about 5 days. For more information, see the full training script and Github repo.

wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
unzip wikitext-103-raw-v1.zip   

export DATA_DIR=./wikitext-103-raw

scripts/run_long_lm.py \
    --model_name_or_path xlm-roberta-base \
    --model_name xlm-roberta-to-longformer \
    --output_dir ./output \
    --logging_dir ./logs \
    --val_file_path $DATA_DIR/wiki.valid.raw \
    --train_file_path $DATA_DIR/wiki.train.raw \
    --seed 42 \
    --max_pos 4096 \
    --adam_epsilon 1e-8 \
    --warmup_steps 500 \
    --learning_rate 3e-5 \
    --weight_decay 0.01 \
    --max_steps 6000 \
    --evaluate_during_training \
    --logging_steps 50 \
    --eval_steps 50 \
    --save_steps 6000  \
    --max_grad_norm 1.0 \
    --per_device_eval_batch_size 2 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 64 \
    --overwrite_output_dir \
    --fp16 \
    --do_train \
    --do_eval

📄 License

The model is released under the Apache - 2.0 license.

Property	Details
Tags	longformer
Language	multilingual
License	apache - 2.0
Datasets	wikitext

⚠️ Important Note

Since both XLM - R model and Longformer models are large models, it is recommended to run the models with NVIDIA Apex (16bit precision), large GPU and several gradient accumulation steps.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご