đ Pino (Dutch BigBird) base model
Pino is a pre - trained model for the Dutch language, based on the BigBird architecture, which extends Transformer - based models to handle much longer sequences.
This model was created by Dat Nguyen & Yeb Havinga during the Hugging Face community week. (Not finished yet)
BigBird is a sparse - attention based transformer. It extends Transformer - based models, like BERT, to handle much longer sequences. Additionally, it comes with a theoretical understanding of what a complete transformer can handle with its sparse model. It is a pre - trained model on the Dutch language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository.
đ Quick Start
⨠Features
BigBird relies on block sparse attention instead of normal attention (e.g., BERT's attention). It can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved state - of - the - art results on various tasks involving very long sequences, such as long document summarization and question - answering with long contexts.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
Here is how to use this model to get the features of a given text in PyTorch:
from transformers import BigBirdModel
model = BigBirdModel.from_pretrained("flax-community/pino-bigbird-roberta-base")
model = BigBirdModel.from_pretrained("flax-community/pino-bigbird-roberta-base", attention_type="original_full")
model = BigBirdModel.from_pretrained("flax-community/pino-bigbird-roberta-base", block_size=16, num_random_blocks=2)
đ Documentation
Training Data
This model is pre - trained on four publicly available datasets: mC4, and scraped Dutch news from NRC en Nu.nl. It uses the fast universal Byte - level BPE (BBPE) in contrast to the sentence piece tokenizer and vocabulary as RoBERTa (which is in turn borrowed from GPT2).
Training Procedure
The data is cleaned as follows:
- Remove texts containing HTML codes / javascript codes / loremipsum / policies
- Remove lines without end mark.
- Remove too short texts, words
- Remove too long texts, words
- Remove bad words
đ§ Technical Details
BigBird extends Transformer - based models to handle much longer sequences by using block sparse attention. This allows it to process sequences up to 4096 in length with lower computational cost compared to models like BERT.
đ License
No license information is provided in the original document.
BibTeX entry and citation info
@misc{zaheer2021big,
title={Big Bird: Transformers for Longer Sequences},
author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
year={2021},
eprint={2007.14062},
archivePrefix={arXiv},
primaryClass={cs.LG}
}