đ Funnel Transformer medium model (B6 - 3x2 - 3x2 without decoder)
This is a pre - trained model on the English language, using a similar objective as ELECTRA. It was introduced in this paper and first released in [this repository](https://github.com/laiguokun/Funnel - Transformer). This model is uncased, meaning it does not distinguish between "english" and "English".
Disclaimer: The team releasing Funnel Transformer did not write a model card for this model, so this model card has been written by the Hugging Face team.
đ Quick Start
The Funnel Transformer model can be used to extract features for downstream tasks. You can use the raw model to get a vector representation of text, but it's mainly designed for fine - tuning on specific tasks. Check the [model hub](https://huggingface.co/models?filter=funnel - transformer) for fine - tuned versions.
⨠Features
- Self - supervised Pretraining: Pretrained on a large English corpus in a self - supervised way, learning an inner representation of the English language.
- Feature Extraction: Can extract useful features for downstream tasks like sequence classification, token classification, or question answering.
- No Decoder: Outputs hidden states with a sequence length of one - fourth of the inputs, suitable for tasks requiring sentence summaries.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
Here is how to use this model to get the features of a given text in PyTorch:
from transformers import FunnelTokenizer, FunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium-base")
model = FunnelBaseModel.from_pretrained("funnel-transformer/medium-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
Advanced Usage
Here is the usage in TensorFlow:
from transformers import FunnelTokenizer, TFFunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium-base")
model = TFFunnelBaseModel.from_pretrained("funnel-transformer/medium-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
đ Documentation
Model description
Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self - supervised fashion. It was pretrained on raw texts only, with an automatic process to generate inputs and labels from those texts. A small language model corrupts the input texts and serves as a generator of inputs for this model. The pretraining objective is to predict which token is original and which one has been replaced, similar to GAN training.
The model learns an inner representation of the English language, which can be used to extract features for downstream tasks. For example, you can train a standard classifier using the features produced by the BERT model as inputs if you have a dataset of labeled sentences.
Note: This model does not contain the decoder, so it outputs hidden states that have a sequence length of one - fourth of the inputs. It's suitable for tasks requiring a summary of the sentence (like sentence classification) but not if you need one input per initial token. You should use the medium
model in that case.
Intended uses & limitations
You can use the raw model to extract a vector representation of a given text, but it's mostly intended to be fine - tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel - transformer) to look for fine - tuned versions on a task that interests you.
Note that this model is primarily aimed at being fine - tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation, you should look at models like GPT2.
Training data
The BERT model was pretrained on:
BibTeX entry and citation info
@misc{dai2020funneltransformer,
title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
year={2020},
eprint={2006.03236},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
đ License
This model is licensed under the Apache 2.0 license.
Property |
Details |
Model Type |
Funnel Transformer medium model (B6 - 3x2 - 3x2 without decoder) |
Training Data |
BookCorpus, English Wikipedia, Clue Web, GigaWord, Common Crawl |
â ī¸ Important Note
This model does not contain the decoder, so it outputs hidden states that have a sequence length of one - fourth of the inputs. It's good for tasks requiring a summary of the sentence (like sentence classification) but not if you need one input per initial token.
đĄ Usage Tip
You can use the raw model to extract a vector representation of a given text, but it's mostly intended to be fine - tuned on a downstream task. Check the [model hub](https://huggingface.co/models?filter=funnel - transformer) for fine - tuned versions.