Open-source language-perceiver model - Supports UTF-8 input multimodal processing, facilitating language task analysis

Language Perceiver

Developed by deepmind

Pre-trained on BERT-style masked language modeling tasks, supports multimodal Transformer model processing UTF-8 byte inputs

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Byte-level MLM #Multimodal Architecture #Long Sequence Processing

Downloads 9,840

Release Time : 3/2/2022

Model Overview

Perceiver IO is a universal Transformer architecture applicable to multiple modalities such as text, images, and audio. This language model is trained directly on raw bytes without requiring a tokenizer and supports masked language modeling tasks.

Model Features

Byte-level Input Processing

Directly processes UTF-8 bytes without requiring a tokenizer or fixed vocabulary

Multimodal Architecture

Base architecture can be extended to other modalities like images and audio

Efficient Attention Mechanism

Achieves input-size-independent computational complexity through latent vectors

Model Capabilities

Text Feature Extraction

Masked Word Prediction

Downstream Task Fine-tuning

Use Cases

Natural Language Processing

Text Completion

Predicts masked portions of text

Successfully predicted masked words in 'missing portions' in examples

Text Classification

Can be fine-tuned for classification tasks like sentiment analysis

Average GLUE benchmark score of 81.8

🚀 Perceiver IO for language

The Perceiver IO model is pre - trained on the Masked Language Modeling (MLM) task. It combines large - scale text data from English Wikipedia and C4, offering a flexible solution for various language - related tasks.

🚀 Quick Start

The Perceiver IO model is pre - trained on the Masked Language Modeling (MLM) task using a large text corpus from a combination of English Wikipedia and C4. It was introduced in the paper Perceiver IO: A General Architecture for Structured Inputs & Outputs by Jaegle et al. and first released in this repository.

Disclaimer: The team releasing Perceiver IO did not write a model card for this model, so this model card has been written by the Hugging Face team.

✨ Features

Modality - agnostic: Perceiver IO is a transformer encoder model applicable to any modality, including text, images, audio, and video.
Efficient self - attention: It employs self - attention on a small set of latent vectors and uses inputs for cross - attention, making time and memory requirements independent of input size.
Flexible decoding: Decoder queries are used for flexible decoding of latent hidden states to produce outputs of arbitrary size and semantics.
Byte - level processing: The model is trained directly on raw UTF - 8 bytes, eliminating the need for tokenizer training and vocabulary maintenance.

💻 Usage Examples

Basic Usage

from transformers import PerceiverTokenizer, PerceiverForMaskedLM

tokenizer = PerceiverTokenizer.from_pretrained("deepmind/language-perceiver")
model = PerceiverForMaskedLM.from_pretrained("deepmind/language-perceiver")

text = "This is an incomplete sentence where some words are missing."
# prepare input
encoding = tokenizer(text, padding="max_length", return_tensors="pt")
# mask " missing.". Note that the model performs much better if the masked span starts with a space.
encoding.input_ids[0, 52:61] = tokenizer.mask_token_id
inputs, input_mask = encoding.input_ids.to(device), encoding.attention_mask.to(device)

# forward pass
outputs = model(inputs=inputs, attention_mask=input_mask)
logits = outputs.logits
masked_tokens_predictions = logits[0, 51:61].argmax(dim=-1)
print(tokenizer.decode(masked_tokens_predictions))
>>> should print " missing."

📚 Documentation

Intended uses & limitations

You can use the raw model for masked language modeling, but it is intended to be fine - tuned on a labeled dataset. Check the model hub for fine - tuned versions on tasks that interest you.

Training data

Property	Details
Model Type	Perceiver IO for language
Training Data	A combination of English Wikipedia and C4. 70% of training tokens from C4 and 30% from Wikipedia.

Training procedure

Preprocessing

Text preprocessing is straightforward: encode text into UTF - 8 bytes and pad them to a fixed length (2048).

Pretraining

Hyperparameter details can be found in table 9 of the paper.

Evaluation results

This model achieves an average score of 81.8 on GLUE. For more details, refer to table 3 of the original paper.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2107-14795,
  author    = {Andrew Jaegle and
               Sebastian Borgeaud and
               Jean{-}Baptiste Alayrac and
               Carl Doersch and
               Catalin Ionescu and
               David Ding and
               Skanda Koppula and
               Daniel Zoran and
               Andrew Brock and
               Evan Shelhamer and
               Olivier J. H{\'{e}}naff and
               Matthew M. Botvinick and
               Andrew Zisserman and
               Oriol Vinyals and
               Jo{\~{a}}o Carreira},
  title     = {Perceiver {IO:} {A} General Architecture for Structured Inputs {\&}
               Outputs},
  journal   = {CoRR},
  volume    = {abs/2107.14795},
  year      = {2021},
  url       = {https://arxiv.org/abs/2107.14795},
  eprinttype = {arXiv},
  eprint    = {2107.14795},
  timestamp = {Tue, 03 Aug 2021 14:53:34 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2107-14795.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

📄 License

This model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご