Long-t5-tglobal-base Open-source Text Conversion Model - Efficiently Process Long-sequence Text Content

Long T5 Tglobal Base

Developed by google

LongT5 is a text-to-text transformation model based on the T5 architecture, employing transient global attention mechanism for efficient processing of long sequence inputs

Large Language Model EnglishOpen Source License:Apache-2.0 #Long Text Processing #Efficient Attention Mechanism #Text Generation

Downloads 71.38k

Release Time : 4/16/2022

Model Overview

LongT5 is a Transformer model based on encoder-decoder architecture, efficiently handling long sequences (up to 16,384 tokens) through local attention or transient-global attention mechanisms, particularly suitable for generation tasks requiring long text processing

Model Features

Long Sequence Processing Capability

Supports long sequence inputs up to 16,384 tokens, achieving efficient processing through sparse attention mechanisms

Transient Global Attention

Adopts innovative transient-global attention mechanism, reducing computational complexity while maintaining performance

Generative Pre-training

Uses Pegasus-like generative denoising pre-training method to optimize text generation capabilities

Model Capabilities

Long text summarization

Long document question answering systems

Text-to-text transformation

Use Cases

Text Summarization

Automatic Summarization of Long Documents

Generates concise summaries for long documents such as research papers and legal documents

Excels in long text summarization tasks

Question Answering Systems

Long Document Question Answering

Extracts information from long documents to answer complex questions

🚀 LongT5 (transient-global attention, base-sized model)

The LongT5 model is pre-trained on the English language, offering efficient text-to-text transformation for long sequences.

🚀 Quick Start

The LongT5 model is pre-trained on the English language. It was introduced in the paper LongT5: Efficient Text-To-Text Transformer for Long Sequences by Guo et al. and first released in the LongT5 repository. All the model architecture and configuration can be found in Flaxformer repository which uses another Google research project repository T5x.

Disclaimer: The team releasing LongT5 did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

The LongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting (Pegasus-like generation pre-training).
It is an extension of T5 model, enabling the use of one of two different efficient attention mechanisms: (1) Local attention, or (2) Transient-Global attention. The use of attention sparsity patterns allows the model to efficiently handle input sequences.
LongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) that requires handling long input sequences (up to 16,384 tokens).

📚 Documentation

Intended uses & limitations

The model is mostly meant to be fine-tuned on a supervised dataset. See the model hub to look for fine-tuned versions on a task that interests you.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, LongT5Model

tokenizer = AutoTokenizer.from_pretrained("google/long-t5-tglobal-base")
model = LongT5Model.from_pretrained("google/long-t5-tglobal-base")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

BibTeX entry and citation info

@article{guo2021longt5,
  title={LongT5: Efficient Text-To-Text Transformer for Long Sequences},
  author={Guo, Mandy and Ainslie, Joshua and Uthus, David and Ontanon, Santiago and Ni, Jianmo and Sung, Yun-Hsuan and Yang, Yinfei},
  journal={arXiv preprint arXiv:2112.07916},
  year={2021}
}

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご