Medium-base Open-source Transformer Model - Based on English corpora, capable of predicting tokens for auxiliary training

Medium Base

Developed by funnel-transformer

A Transformer model pre-trained on English corpus using self-supervised learning similar to ELECTRA, trained by predicting replaced tokens.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Text Feature Extraction #Efficient Sequence Processing #English Pre-training

Downloads 69

Release Time : 3/2/2022

Model Overview

This model is pre-trained on massive English text through self-supervised learning, suitable for text feature extraction or downstream task fine-tuning, especially for tasks requiring sentence summarization.

Model Features

Efficient Sequence Processing

Compresses sequence length through funnel structure, outputting only a quarter of input length to improve processing efficiency.

Self-supervised Pre-training

Uses adversarial training similar to ELECTRA, learning language representations by predicting replaced tokens.

Case Insensitive

Treats words with different cases as the same token, simplifying text processing.

Model Capabilities

Text feature extraction

Sequence classification

Token classification

Question answering system

Use Cases

Text Analysis

Sentiment Analysis

Classifies sentiment tendencies of sentences or paragraphs

Text Classification

Categorizes text into predefined classes

Information Extraction

Named Entity Recognition

Identifies entities such as person names, locations, and organizations in text

🚀 Funnel Transformer medium model (B6 - 3x2 - 3x2 without decoder)

This is a pre - trained model on the English language, using a similar objective as ELECTRA. It was introduced in this paper and first released in [this repository](https://github.com/laiguokun/Funnel - Transformer). This model is uncased, meaning it does not distinguish between "english" and "English".

Disclaimer: The team releasing Funnel Transformer did not write a model card for this model, so this model card has been written by the Hugging Face team.

🚀 Quick Start

The Funnel Transformer model can be used to extract features for downstream tasks. You can use the raw model to get a vector representation of text, but it's mainly designed for fine - tuning on specific tasks. Check the [model hub](https://huggingface.co/models?filter=funnel - transformer) for fine - tuned versions.

✨ Features

Self - supervised Pretraining: Pretrained on a large English corpus in a self - supervised way, learning an inner representation of the English language.
Feature Extraction: Can extract useful features for downstream tasks like sequence classification, token classification, or question answering.
No Decoder: Outputs hidden states with a sequence length of one - fourth of the inputs, suitable for tasks requiring sentence summaries.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import FunnelTokenizer, FunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium-base")
model = FunnelBaseModel.from_pretrained("funnel-transformer/medium-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

Advanced Usage

Here is the usage in TensorFlow:

from transformers import FunnelTokenizer, TFFunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium-base")
model = TFFunnelBaseModel.from_pretrained("funnel-transformer/medium-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

📚 Documentation

Model description

Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self - supervised fashion. It was pretrained on raw texts only, with an automatic process to generate inputs and labels from those texts. A small language model corrupts the input texts and serves as a generator of inputs for this model. The pretraining objective is to predict which token is original and which one has been replaced, similar to GAN training.

The model learns an inner representation of the English language, which can be used to extract features for downstream tasks. For example, you can train a standard classifier using the features produced by the BERT model as inputs if you have a dataset of labeled sentences.

Note: This model does not contain the decoder, so it outputs hidden states that have a sequence length of one - fourth of the inputs. It's suitable for tasks requiring a summary of the sentence (like sentence classification) but not if you need one input per initial token. You should use the medium model in that case.

Intended uses & limitations

You can use the raw model to extract a vector representation of a given text, but it's mostly intended to be fine - tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel - transformer) to look for fine - tuned versions on a task that interests you.

Note that this model is primarily aimed at being fine - tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation, you should look at models like GPT2.

Training data

The BERT model was pretrained on:

BookCorpus, a dataset consisting of 11,038 unpublished books.
English Wikipedia (excluding lists, tables and headers).
Clue Web, a dataset of 733,019,372 English web pages.
GigaWord, an archive of newswire text data.
Common Crawl, a dataset of raw web pages.

BibTeX entry and citation info

@misc{dai2020funneltransformer,
    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
    year={2020},
    eprint={2006.03236},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

📄 License

This model is licensed under the Apache 2.0 license.

Property	Details
Model Type	Funnel Transformer medium model (B6 - 3x2 - 3x2 without decoder)
Training Data	BookCorpus, English Wikipedia, Clue Web, GigaWord, Common Crawl

⚠️ Important Note

This model does not contain the decoder, so it outputs hidden states that have a sequence length of one - fourth of the inputs. It's good for tasks requiring a summary of the sentence (like sentence classification) but not if you need one input per initial token.

💡 Usage Tip

You can use the raw model to extract a vector representation of a given text, but it's mostly intended to be fine - tuned on a downstream task. Check the [model hub](https://huggingface.co/models?filter=funnel - transformer) for fine - tuned versions.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご