Pino - BigBird - Roberta - Base Open - source Dutch Model - Handle Long Texts, Support 4096 Sequence Length

Pino Bigbird Roberta Base

Developed by flax-community

Pino is a Dutch pre-trained model based on the BigBird architecture, utilizing sparse attention mechanisms to handle long sequence texts, supporting sequences up to 4096 in length.

Large Language Model Other#Dutch language processing #Long sequence modeling #Sparse attention

Downloads 17

Release Time : 3/2/2022

Model Overview

BigBird is a Transformer model based on sparse attention, capable of efficiently processing long sequence texts. This model is pre-trained specifically for Dutch and is suitable for tasks requiring long-text processing.

Model Features

Long Sequence Processing Capability

Utilizes block sparse attention mechanisms to efficiently process sequences up to 4096 in length, with significantly lower computational costs compared to traditional Transformers.

Dutch Language Optimization

Pre-trained specifically for Dutch using mC4 and Dutch news datasets.

Flexible Attention Configuration

Supports full attention mode and block sparse mode, with adjustable block_size and num_random_blocks parameters.

Model Capabilities

Long-text understanding

Dutch text processing

Masked language modeling

Use Cases

Natural Language Processing

Long Document Summarization

Process and analyze long documents to generate summaries

Long-context Question Answering

Answer complex questions based on long document content

🚀 Pino (Dutch BigBird) base model

Pino is a pre - trained model for the Dutch language, based on the BigBird architecture, which extends Transformer - based models to handle much longer sequences.

This model was created by Dat Nguyen & Yeb Havinga during the Hugging Face community week. (Not finished yet)

BigBird is a sparse - attention based transformer. It extends Transformer - based models, like BERT, to handle much longer sequences. Additionally, it comes with a theoretical understanding of what a complete transformer can handle with its sparse model. It is a pre - trained model on the Dutch language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository.

🚀 Quick Start

✨ Features

BigBird relies on block sparse attention instead of normal attention (e.g., BERT's attention). It can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved state - of - the - art results on various tasks involving very long sequences, such as long document summarization and question - answering with long contexts.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import BigBirdModel

# by default its in `block_sparse` mode with num_random_blocks=3, block_size=64
model = BigBirdModel.from_pretrained("flax-community/pino-bigbird-roberta-base")

# you can change `attention_type` to full attention like this:
model = BigBirdModel.from_pretrained("flax-community/pino-bigbird-roberta-base", attention_type="original_full")

# you can change `block_size` & `num_random_blocks` like this:
model = BigBirdModel.from_pretrained("flax-community/pino-bigbird-roberta-base", block_size=16, num_random_blocks=2)

📚 Documentation

Training Data

This model is pre - trained on four publicly available datasets: mC4, and scraped Dutch news from NRC en Nu.nl. It uses the fast universal Byte - level BPE (BBPE) in contrast to the sentence piece tokenizer and vocabulary as RoBERTa (which is in turn borrowed from GPT2).

Training Procedure

The data is cleaned as follows:

Remove texts containing HTML codes / javascript codes / loremipsum / policies
Remove lines without end mark.
Remove too short texts, words
Remove too long texts, words
Remove bad words

🔧 Technical Details

BigBird extends Transformer - based models to handle much longer sequences by using block sparse attention. This allows it to process sequences up to 4096 in length with lower computational cost compared to models like BERT.

📄 License

No license information is provided in the original document.

BibTeX entry and citation info

@misc{zaheer2021big,
      title={Big Bird: Transformers for Longer Sequences}, 
      author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
      year={2021},
      eprint={2007.14062},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご