Lola_v1 Open-Source Multilingual Large Model - Supports Natural Language Generation and Understanding in Over 160 Languages

Lola V1

Developed by dice-research

LOLA is an ultra-large-scale multilingual large model based on the sparse Mixture-of-Experts (MoE) Transformer architecture, supporting over 160 languages, with competitive advantages in natural language generation and understanding tasks.

Large Language Model

Transformers

Other#Ultra-large-scale multilingual #Mixture-of-Experts architecture #160+ language support

Downloads 867

Release Time : 4/2/2024

Model Overview

LOLA is an open-source multilingual large model that adopts a GPT2-style decoder-only architecture combined with sparse Mixture-of-Experts technology, supporting text generation tasks in over 160 languages.

Model Features

Multilingual support

Supports over 160 languages, excelling in multilingual natural language processing tasks

Mixture-of-Experts architecture

Employs a sparse Mixture-of-Experts (MoE) architecture with 16 experts, enhancing model performance while maintaining efficiency

Open-source and reproducible

Fully open-source, promoting research reproducibility and laying the foundation for future studies

Computationally efficient

Optimizes computational resource usage through expert routing mechanisms, activating only a subset of parameters per token

Model Capabilities

Multilingual text generation

Causal language modeling

Natural language understanding

Use Cases

Text generation

Multilingual text completion

Generates coherent subsequent content based on given text fragments

Example: Input 'The quick brown fox' outputs 'The quick brown fox jumps over the lazy dog.'

Language research

Cross-linguistic pattern analysis

Studies implicit linguistic genealogical patterns across different languages

The model reveals how expert routing mechanisms leverage implicit linguistic genealogical patterns

🚀 LOLA — An Open-Source Massively Multilingual Large Language Model

LOLA is a large language model trained on over 160 languages, using a sparse Mixture-of-Experts Transformer architecture. It offers competitive performance in NLP tasks and promotes reproducibility for future research.

🚀 Quick Start

This pre-trained (causal language modeling) model can only be used for text-generation and requires further fine-tuning on downstream tasks.

How to use

You can use this model directly with a pipeline for text generation.

>>> from transformers import pipeline

>>> generator = pipeline('text-generation', model="dice-research/lola_v1", trust_remote_code=True)
>>> generator("The quick brown fox", max_length=13)
[{'generated_text': 'The quick brown fox jumps over the lazy dog.'}]

To use the top-k sampling, please set do_sample to True.

⚠️ Important Note

The tokenizer used in the model comes from mGPT (https://github.com/ai-forever/mgpt)

✨ Features

LOLA is a massively multilingual large language model trained on more than 160 languages using a sparse Mixture-of-Experts Transformer architecture. Our architectural and implementation choices address the challenge of harnessing linguistic diversity while maintaining efficiency and avoiding the common pitfalls of multilinguality. Our analysis of the evaluation results shows competitive performance in natural language generation and understanding tasks. Additionally, we demonstrate how the learned expert-routing mechanism exploits implicit phylogenetic linguistic patterns to potentially alleviate the curse of multilinguality.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

>>> from transformers import pipeline

>>> generator = pipeline('text-generation', model="dice-research/lola_v1", trust_remote_code=True)
>>> generator("The quick brown fox", max_length=13)
[{'generated_text': 'The quick brown fox jumps over the lazy dog.'}]

Advanced Usage

No advanced usage examples are provided in the original document, so this part is skipped.

📚 Documentation

Model Description

Property	Details
Developed by	DICE Research Group (https://dice-research.org/) @ Paderborn University (https://www.uni-paderborn.de/)
Model Type	GPT2 style (decoder-only) with alternating sparse Mixture-of-Experts layers
Number of Experts	16
Model Size	1.3 Billion (active*) / 7.4 Billion (total)
Language(s) (NLP)	160+
License	CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
Repository	https://github.com/dice-group/LOLA

_{* The number of parameters a model utilizes per token (ref: Fedus et al, 2022 ; Du et al, 2022 ). This distinction is crucial for understanding the efficiency and performance of MoE models.}

Training Details

Training Framework

DeepSpeed Megatron (https://github.com/microsoft/Megatron-DeepSpeed)
Architecture type: Transformers (Decoder-only) with Mixture-of-Experts (MoE)
Number of Experts: 16
Model Size: 1.3 Billion Dense / 7.4 Billion Sparse

Pretraining Dataset

CulturaX (https://huggingface.co/datasets/uonlp/CulturaX)
Total Tokens: 6.3 Trillion
Total Languages: 167

LOLA v1 Training

Computing cluster: Noctua2 (https://pc2.uni-paderborn.de/hpc-services/available-systems/noctua2)
Number of GPUs: 96x Nvidia A100 (40GB)
Training steps: 296000
Tokens consumed: 465 Billion
Training time: ~19 days

🔧 Technical Details

The analysis of the evaluation results shows competitive performance in natural language generation and understanding tasks. Additionally, the learned expert-routing mechanism exploits implicit phylogenetic linguistic patterns to potentially alleviate the curse of multilinguality.

📄 License

The model is licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

📖 Citation

If you use our work in your research, please make sure to cite it:

@inproceedings{srivastava-etal-2025-lola,
  author    = {Nikit Srivastava and Denis Kuchelev and Tatiana Moteu Ngoli and Kshitij Shetty and Michael Röder and Hamada Zahera and Diego Moussallem and Axel-Cyrille Ngonga Ngomo},
  title     = {{LOLA} -- An Open-Source Massively Multilingual Large Language Model},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics},
  editor    = {Owen Rambow and Leo Wanner and Marianna Apidianaki and Hend Al-Khalifa and Barbara Di Eugenio and Steven Schockaert},
  month     = jan,
  year      = {2025},
  address   = {Abu Dhabi, UAE},
  publisher = {Association for Computational Linguistics},
  pages     = {6420--6446},
  url       = {https://aclanthology.org/2025.coling-main.428/},
  note      = {arXiv:2409.11272 [cs.CL]},
}

Paper: https://arxiv.org/abs/2409.11272

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご