đ Funnel Transformer intermediate model (B6-6-6 without decoder)
A pretrained model on English language, similar to ELECTRA, useful for extracting features for downstream tasks.
đ Quick Start
This model is pretrained on English language using a similar objective as ELECTRA. It was introduced in this paper and first released in this repository. This uncased model treats "english" and "English" the same.
Disclaimer: The team releasing Funnel Transformer did not write a model card for this model, so this model card has been written by the Hugging Face team.
⨠Features
- Pretrained on a large corpus of English data in a self - supervised fashion.
- Learns an inner representation of the English language for downstream tasks.
- Outputs hidden states with a sequence length of one - fourth of the inputs (without decoder).
đ Documentation
Model description
Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self - supervised manner. It was pretrained on raw texts without human labeling, using an automatic process to generate inputs and labels.
Specifically, a small language model corrupts the input texts and serves as an input generator. The pretraining objective is to predict original and replaced tokens, similar to GAN training.
The model learns an inner representation of English, which can be used to extract features for downstream tasks. For example, you can train a standard classifier using the features produced by this model as inputs.
Note: This model does not contain the decoder, so it outputs hidden states with a sequence length of one - fourth of the inputs. It's suitable for tasks requiring sentence summaries (like sentence classification) but not for tasks needing one input per initial token. In that case, use the intermediate
model.
Intended uses & limitations
You can use the raw model to extract vector representations of text, but it's mainly for fine - tuning on downstream tasks. Check the model hub for fine - tuned versions.
This model is mainly for fine - tuning on tasks using the whole sentence, such as sequence classification, token classification, or question answering. For text generation, consider models like GPT2.
How to use
đģ Usage Examples
Basic Usage
from transformers import FunnelTokenizer, FunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/intermediate-base")
model = FunnelBaseModel.from_pretrained("funnel-transformer/intermediate-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
from transformers import FunnelTokenizer, TFFunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/intermediate-base")
model = TFFunnelBaseModel.from_pretrained("funnel-transformer/intermediate-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
Training data
The model was pretrained on:
BibTeX entry and citation info
@misc{dai2020funneltransformer,
title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
year={2020},
eprint={2006.03236},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Important Note
This model does not contain the decoder, so it outputs hidden states that have a sequence length of one fourth of the inputs. It's good to use for tasks requiring a summary of the sentence (like sentence classification) but not if you need one input per initial token. You should use the intermediate
model in that case.
Usage Tip
You can use the raw model to extract a vector representation of a given text, but it's mostly intended to be fine - tuned on a downstream task. See the model hub to look for fine - tuned versions on a task that interests you.
đ License
This model is licensed under the Apache 2.0 license.