Transfo-XL-WT103 Open-Source Text Generation Model - Capturing Long Contexts for Smooth Text Creation

Transfo Xl Wt103

Developed by transfo-xl

Transformer-XL is a causal Transformer architecture that uses relative position encoding. It can capture longer context by reusing previously computed hidden states, making it suitable for text generation tasks.

Text Generation

Transformers

English#Long text generation #Relative position encoding #Adaptive softmax

Downloads 4,498

Release Time : 3/2/2022

Model Overview

This model is trained on the Wikitext-103 dataset and is primarily used for English text generation tasks. It employs adaptive softmax input/output and memory mechanisms to enhance long-text processing capabilities.

Model Features

Long-text memory mechanism

Achieves cross-segment memory by reusing previously computed hidden states, effectively capturing long-range dependencies.

Relative position encoding

Uses a sinusoidal wave embedding scheme for position encoding, enhancing the model's sensitivity to positional information.

Adaptive softmax

Employs tied input-output adaptive softmax to improve computational efficiency.

Model Capabilities

English text generation

Long-text sequence modeling

Use Cases

Content creation

Automatic text continuation

Generates coherent subsequent text based on a given starting point.

Can generate coherent text of 500-1000 tokens.

Educational research

Language model research

Used to study modeling methods for long-text dependencies.

Achieves a perplexity of 18.3 on Wikitext-103.

🚀 Transfo-xl-wt103

The Transfo-xl-wt103 model is designed for text generation. It can reuse previously computed hidden - states to handle longer context, offering a unique approach to text - related tasks.

✨ Model Details

Model Description: The Transformer - XL model is a causal (uni - directional) transformer with relative positioning (sinusoïdal) embeddings which can reuse previously computed hidden - states to attend to longer context (memory). This model also uses adaptive softmax inputs and outputs (tied).

Developed by: Zihang Dai, Zhilin Yang, Yiming Yang1, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Shared by: HuggingFace team
Model Type: Text Generation
Language(s): English
License: [More information needed]
Resources for more information:

Property	Details
Model Type	Text Generation
Training Data	WikiText - 103

📝 Uses

💼 Direct Use

This model can be used for text generation. The authors provide additionally notes about the vocabulary used, in the associated paper:

We envision interesting applications of Transformer - XL in the fields of text generation, unsupervised feature learning, image and speech modeling.

⚠️ Misuse and Out - of - scope Use

The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out - of - scope for the abilities of this model.

⚠️ Risks, Limitations and Biases

⚠️ Important Note

Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl - long.330.pdf) and Bender et al. (2021)).

🔧 Training

📊 Training Data

The authors provide additionally notes about the vocabulary used, in the associated paper:

best model trained the Wikitext - 103 dataset. We seed the our Transformer - XL with a context of at most 512 consecutive tokens randomly sampled from the test set of Wikitext - 103. Then, we run Transformer - XL to generate a pre - defined number of tokens (500 or 1,000 in our case). For each generation step, we first find the top - 40 probabilities of the next - step distribution and sample from top - 40 tokens based on the re - normalized distribution. To help reading, we detokenize the context, the generated text and the reference text.

The authors use the following pretraining corpora for the model, described in the associated paper:

WikiText - 103 (Merity et al., 2016)

⚙️ Training Procedure

🛠️ Preprocessing

The authors provide additionally notes about the training procedure used, in the associated paper:

Similar to but different from enwik8, text8 con - tains 100M processed Wikipedia characters cre - ated by lowering case the text and removing any character other than the 26 letters a through z, and space. Due to the similarity, we simply adapt the best model and the same hyper - parameters on en - wik8 to text8 without further tuning.

📈 Evaluation

📊 Results

Method	enwiki8	text8	One Billion Word	WT - 103	PTB (w/o finetuning)
Transformer - XL.	0.99	1.08	21.8	18.3	54.5

📄 Citation Information


@misc{https://doi.org/10.48550/arxiv.1901.02860,
  doi = {10.48550/ARXIV.1901.02860},
  
  url = {https://arxiv.org/abs/1901.02860},
  
  author = {Dai, Zihang and Yang, Zhilin and Yang, Yiming and Carbonell, Jaime and Le, Quoc V. and Salakhutdinov, Ruslan},
  
  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Transformer - XL: Attentive Language Models Beyond a Fixed - Length Context},
  
  publisher = {arXiv},
  
  year = {2019},
  
  copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International}
}

💻 How to Get Started With the Model

📋 Basic Usage

# Code examples remain unchanged
from transformers import TransfoXLTokenizer, TransfoXLModel
import torch

tokenizer = TransfoXLTokenizer.from_pretrained("transfo-xl-wt103")
model = TransfoXLModel.from_pretrained("transfo-xl-wt103")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご