🚀 LWM 1.1
LWM 1.1 is an updated pre - trained model for feature extraction in wireless channels, offering enhanced scalability, generalization, and efficiency.
🚀 Quick Start
✨ Features
🎥 LWM Tutorial Series
Explore LWM concepts and applications in this compact video series:
How is LWM 1.1 built?
LWM 1.1 is a transformer - based architecture designed to model spatial and frequency dependencies in wireless channel data. It utilizes an enhanced Masked Channel Modeling (MCM) pretraining approach, with an increased masking ratio to improve feature learning and generalization. The introduction of 2D patch segmentation allows the model to jointly process spatial (antenna) and frequency (subcarrier) relationships, providing a more structured representation of the channel. Additionally, bucket - based batching is employed to efficiently handle variable - sized inputs without excessive padding, ensuring memory - efficient training and inference. These modifications enable LWM 1.1 to extract meaningful embeddings from a wide range of wireless scenarios, improving its applicability across different system configurations.
What does LWM 1.1 offer?
LWM 1.1 serves as a general - purpose feature extractor for wireless communication and sensing tasks. Pretrained on an expanded and more diverse dataset, it effectively captures channel characteristics across various environments, including dense urban areas, simulated settings, and real - world deployments. The model's increased capacity and optimized pretraining strategy improve the quality of extracted representations, enhancing its applicability for downstream tasks.
How is LWM 1.1 used?
LWM 1.1 is designed for seamless integration into wireless communication pipelines as a pre - trained embedding extractor. By processing raw channel data, the model generates structured representations that encode spatial, frequency, and propagation characteristics. These embeddings can be directly used for downstream tasks, reducing the need for extensive labeled data while improving model efficiency and generalization across different system configurations.
Advantages of Using LWM 1.1
- Enhanced Flexibility: Handles diverse channel configurations with no size limitations.
- Refined Embeddings: Improved feature extraction through advanced pretraining and increased model capacity.
- Efficient Processing: Memory - optimized with bucket - based batching for variable - sized inputs.
- Broad Generalization: Trained on a larger, more diverse dataset for reliable performance across environments.
- Task Adaptability: Fine - tuning options enable seamless integration into a wide range of applications.
For example, the following figure demonstrates the advantages of using LWM - based highly compact CLS embeddings and high - dimensional channel embeddings over raw channels for the LoS/NLoS classification task. The raw dataset is derived from channels of size (32, 32) between BS 3 and 8,299 users in the densified Denver scenario of the DeepMIMO dataset.
Figure: This figure shows the F1 - score comparison of models trained with wireless channels and their LWM embeddings for LoS/NLoS classification.
🔧 Technical Details
Key Improvements in LWM - v1.1
1️⃣ Expanded Input Flexibility
- Removed Fixed Channel Size Constraints: Supports multiple (N, SC) configurations instead of being restricted to (32, 32).
- Increased Sequence Length: Extended from 128 to 512, allowing the model to process larger input dimensions efficiently.
2️⃣ Enhanced Dataset and Pretraining
- Broader Dataset Coverage: Increased the number of training scenarios from 15 to 140, improving generalization across environments.
- Higher Masking Ratio in MCM: Increased from 15% to 40%, making the Masked Channel Modeling (MCM) task more challenging and effective for feature extraction.
- Larger Pretraining Dataset: Expanded from 820K to 1.05M samples for more robust representation learning.
3️⃣ Improved Model Architecture
- Increased Model Capacity: Parameter count expanded from 600K to 2.5M, enhancing representational power.
- 2D Patch Segmentation: Instead of segmenting channels along a single dimension (antennas or subcarriers), patches now span both antennas and subcarriers, improving spatial - frequency feature learning.
4️⃣ Optimized Training and Efficiency
- Adaptive Learning Rate Schedule: Implemented AdamW with Cosine Decay, improving convergence stability.
- Computational Efficiency: Reduced the number of attention heads per layer from 12 to 8, balancing computational cost with feature extraction capability.
Comparison of LWM Versions
Property |
Details |
Channel Size Limitation |
LWM 1.0: Fixed at (32, 32); LWM 1.1: Supports multiple (N, SC) pairs |
Sequence Length Support |
LWM 1.0: 128 (16 - dimensional); LWM 1.1: 512 (32 - dimensional) |
Pre - training Samples |
LWM 1.0: 820K; LWM 1.1: 1.05M |
Pre - training Scenarios |
LWM 1.0: 15; LWM 1.1: 140 |
Masking Ratio |
LWM 1.0: 15%; LWM 1.1: 40% |
Embedding size |
LWM 1.0: 64; LWM 1.1: 128 |
Number of Parameters |
LWM 1.0: 600K; LWM 1.1: 2.5M |
Segmentation |
LWM 1.0: 1D; LWM 1.1: 2D |
📚 Documentation
Detailed Changes in LWM 1.1
No Channel Size Limitation
In LWM 1.0, the model was pre - trained on a single (N, SC)=(32, 32) pair, which limited its generalization to other channel configurations. Wireless communication systems in the real world exhibit vast variability in the number of antennas (N) at base stations and subcarriers (SC). To address this limitation, LWM 1.1 was pre - trained on 20 distinct (N, SC) pairs, ranging from smaller setups like (8, 32) to more complex setups like (128, 64). This variety enables the model to effectively handle diverse channel configurations and ensures robust generalization without overfitting to specific configurations.
To handle variable - sized inputs efficiently, we implemented bucket - based batching, where inputs of similar sizes are grouped together. For example, channels with sizes (32, 64) and (16, 128) are placed in the same bucket, avoiding the excessive padding common in traditional batching approaches. This not only saves memory but also ensures computational efficiency during training. Furthermore, validation samples were drawn as 20% of each bucket, maintaining a balanced evaluation process across all input sizes.
This approach eliminates the rigidity of fixed channel sizes and positions LWM 1.1 as a versatile model capable of adapting to real - world wireless systems with varying configurations.
Larger and More Diverse Pretraining Dataset
Generalization is a critical aspect of any foundation model. In LWM 1.1, we significantly expanded the training dataset to cover more diverse scenarios and environments. We added seven new city scenarios—Charlotte, Denver, Oklahoma, Indianapolis, Fort Worth, Santa Clara, and San Diego—to enrich the model’s exposure to a variety of real - world conditions.