đ Perceiver IO for language
The Perceiver IO model is pre - trained on the Masked Language Modeling (MLM) task. It combines large - scale text data from English Wikipedia and C4, offering a flexible solution for various language - related tasks.
đ Quick Start
The Perceiver IO model is pre - trained on the Masked Language Modeling (MLM) task using a large text corpus from a combination of English Wikipedia and C4. It was introduced in the paper Perceiver IO: A General Architecture for Structured Inputs & Outputs by Jaegle et al. and first released in this repository.
Disclaimer: The team releasing Perceiver IO did not write a model card for this model, so this model card has been written by the Hugging Face team.
⨠Features
- Modality - agnostic: Perceiver IO is a transformer encoder model applicable to any modality, including text, images, audio, and video.
- Efficient self - attention: It employs self - attention on a small set of latent vectors and uses inputs for cross - attention, making time and memory requirements independent of input size.
- Flexible decoding: Decoder queries are used for flexible decoding of latent hidden states to produce outputs of arbitrary size and semantics.
- Byte - level processing: The model is trained directly on raw UTF - 8 bytes, eliminating the need for tokenizer training and vocabulary maintenance.
đģ Usage Examples
Basic Usage
from transformers import PerceiverTokenizer, PerceiverForMaskedLM
tokenizer = PerceiverTokenizer.from_pretrained("deepmind/language-perceiver")
model = PerceiverForMaskedLM.from_pretrained("deepmind/language-perceiver")
text = "This is an incomplete sentence where some words are missing."
encoding = tokenizer(text, padding="max_length", return_tensors="pt")
encoding.input_ids[0, 52:61] = tokenizer.mask_token_id
inputs, input_mask = encoding.input_ids.to(device), encoding.attention_mask.to(device)
outputs = model(inputs=inputs, attention_mask=input_mask)
logits = outputs.logits
masked_tokens_predictions = logits[0, 51:61].argmax(dim=-1)
print(tokenizer.decode(masked_tokens_predictions))
>>> should print " missing."
đ Documentation
Intended uses & limitations
You can use the raw model for masked language modeling, but it is intended to be fine - tuned on a labeled dataset. Check the model hub for fine - tuned versions on tasks that interest you.
Training data
Property |
Details |
Model Type |
Perceiver IO for language |
Training Data |
A combination of English Wikipedia and C4. 70% of training tokens from C4 and 30% from Wikipedia. |
Training procedure
Preprocessing
Text preprocessing is straightforward: encode text into UTF - 8 bytes and pad them to a fixed length (2048).
Pretraining
Hyperparameter details can be found in table 9 of the paper.
Evaluation results
This model achieves an average score of 81.8 on GLUE. For more details, refer to table 3 of the original paper.
BibTeX entry and citation info
@article{DBLP:journals/corr/abs-2107-14795,
author = {Andrew Jaegle and
Sebastian Borgeaud and
Jean{-}Baptiste Alayrac and
Carl Doersch and
Catalin Ionescu and
David Ding and
Skanda Koppula and
Daniel Zoran and
Andrew Brock and
Evan Shelhamer and
Olivier J. H{\'{e}}naff and
Matthew M. Botvinick and
Andrew Zisserman and
Oriol Vinyals and
Jo{\~{a}}o Carreira},
title = {Perceiver {IO:} {A} General Architecture for Structured Inputs {\&}
Outputs},
journal = {CoRR},
volume = {abs/2107.14795},
year = {2021},
url = {https://arxiv.org/abs/2107.14795},
eprinttype = {arXiv},
eprint = {2107.14795},
timestamp = {Tue, 03 Aug 2021 14:53:34 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2107-14795.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
đ License
This model is released under the Apache - 2.0 license.