Y

Yoso 4096

Developed by uw-madison
YOSO is an efficient Transformer variant that reduces the self-attention complexity from quadratic to linear through Bernoulli sampling attention mechanism, supporting sequence lengths up to 4096.
Downloads 2,072
Release Time : 3/2/2022

Model Overview

The YOSO model is designed for long-sequence masked language modeling, employing improved Locality-Sensitive Hashing (LSH) and Bernoulli sampling attention mechanisms to significantly enhance computational efficiency.

Model Features

Linear Complexity Attention
Reduces the traditional Transformer's O(n²) attention complexity to O(n) through Bernoulli sampling.
Long Sequence Support
Optimized for sequences up to 4096 tokens, significantly outperforming standard 512-length Transformers.
GPU-Optimized Design
Improved LSH implementation specifically optimized for GPU architecture.

Model Capabilities

Long-text semantic understanding
Masked word prediction
Contextual feature extraction

Use Cases

Natural Language Processing
Text Completion
Predicts masked text content.
Example shows predicting 'capital' for 'Paris is the [MASK] of France' with reasonable results.
Long Document Analysis
Processes document sequences up to 4096 tokens.
Outperforms other efficient attention methods in the LRA benchmark.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase