T

Transfo Xl Wt103

Developed by transfo-xl
Transformer-XL is a causal Transformer architecture that uses relative position encoding. It can capture longer context by reusing previously computed hidden states, making it suitable for text generation tasks.
Downloads 4,498
Release Time : 3/2/2022

Model Overview

This model is trained on the Wikitext-103 dataset and is primarily used for English text generation tasks. It employs adaptive softmax input/output and memory mechanisms to enhance long-text processing capabilities.

Model Features

Long-text memory mechanism
Achieves cross-segment memory by reusing previously computed hidden states, effectively capturing long-range dependencies.
Relative position encoding
Uses a sinusoidal wave embedding scheme for position encoding, enhancing the model's sensitivity to positional information.
Adaptive softmax
Employs tied input-output adaptive softmax to improve computational efficiency.

Model Capabilities

English text generation
Long-text sequence modeling

Use Cases

Content creation
Automatic text continuation
Generates coherent subsequent text based on a given starting point.
Can generate coherent text of 500-1000 tokens.
Educational research
Language model research
Used to study modeling methods for long-text dependencies.
Achieves a perplexity of 18.3 on Wikitext-103.
Featured Recommended AI Models
ยฉ 2025AIbase