Nystromformer 4096
N
Nystromformer 4096
Developed by uw-madison
Long-sequence Nyströmformer model trained on WikiText-103 v1 dataset, supports sequence processing up to 4096 tokens
Downloads 74
Release Time : 4/18/2022
Model Overview
A Transformer variant using Nyström approximation method, specialized in efficient long-sequence text processing by reducing self-attention complexity
Model Features
Long Sequence Processing
Supports input sequences up to 4096 tokens, overcoming traditional Transformer's context length limitations
Efficient Attention Mechanism
Uses Nyström method to approximate self-attention computation, significantly reducing O(n²) complexity
Memory Optimization
Reduces memory usage through low-rank approximation of attention matrices
Model Capabilities
Long-text language modeling
Context-aware text generation
Document-level semantic understanding
Use Cases
Text Generation
Long Document Auto-completion
Generates coherent subsequent text based on long context
Maintains long-distance semantic consistency
Language Model Research
Long-sequence Modeling Benchmark
Evaluates model performance in capturing long-range dependencies
Featured Recommended AI Models