Open-source large model - comparable to advanced models. With 142 billion parameters, it activates 14 billion for efficient operation.

Dots.llm1.inst

Developed by rednote-hilab

dots.llm1 is a large-scale MoE model that activates 14 billion parameters out of a total of 142 billion parameters, and its performance is comparable to that of the state-of-the-art models.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Efficient MoE Architecture #Trillion-token Pretraining #Support for Both Chinese and English

Downloads 440

Release Time : 5/14/2025

Model Overview

dots.llm1 is an open-source large-scale MoE model with an efficient data processing pipeline and high-performance inference capabilities, supporting both English and Chinese.

Model Features

Efficient Data Processing

Adopt a three-stage data processing framework to generate large-scale, high-quality, and diverse pre-training data.

Pretraining without Synthetic Data

The base model was pre-trained using 1.12 trillion high-quality non-synthetic tokens.

High Performance and Cost Efficiency

Only 14 billion parameters are activated during inference, combining comprehensive capabilities with high computational efficiency.

Innovative Infrastructure

Introduce an innovative MoE all-to-all communication and computation overlapping scheme based on interleaved 1F1B pipeline scheduling and efficient grouped GEMM implementation.

Open Model Dynamics

Intermediate model checkpoints trained every 1 trillion tokens were released to facilitate the study of the learning dynamics of large language models.

Model Capabilities

Text Generation

Dialogue System

Code Generation

Use Cases

Natural Language Processing

Text Completion

Used to generate coherent text completions, suitable for scenarios such as writing assistance and content generation.

Dialogue System

Used to build intelligent dialogue systems to provide a natural and smooth dialogue experience.

Programming Assistance

Code Generation

Used to generate code snippets, such as the implementation of the quicksort algorithm.

🚀 Dots1

Dots1 is a large - scale MoE model that offers high - performance text generation capabilities, supporting both English and Chinese.

🚀 Quick Start

Visit our Hugging Face (click links above), search checkpoints with names starting with dots.llm1 or visit the dots1 collection, and you will find all you need! Enjoy!

✨ Features

High - Performance Model: The dots.llm1 model is a large - scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state - of - the - art models.
Enhanced Data Processing: We propose a scalable and fine - grained three - stage data processing framework designed to generate large - scale, high - quality and diverse data for pretraining.
No Synthetic Data during Pretraining: 11.2 trillion high - quality non - synthetic tokens were used in base model pretraining.
Performance and Cost Efficiency: dots.llm1 is an open - source model that activates only 14B parameters at inference, delivering both comprehensive capabilities and high computational efficiency.
Innovative Infrastructure: We introduce an innovative MoE all - to - all communication and computation overlapping recipe based on interleaved 1F1B pipeline scheduling and an efficient grouped GEMM implementation to boost computational efficiency.
Open Accessibility to Model Dynamics: Intermediate model checkpoints for every 1T tokens trained are released, facilitating future research into the learning dynamics of large language models.

📦 Installation

The docker images are available on Docker Hub, based on the official images.

You can start a server via vllm:

docker run --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    rednotehilab/dots1:vllm-openai-v0.9.0.1 \
    --model rednote-hilab/dots.llm1.inst \
    --tensor-parallel-size 8 \
    --trust-remote-code \
    --served-model-name dots1

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "rednote-hilab/dots.llm1.base"
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)

text = "An attention function can be described as mapping a query and a set of key - value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Advanced Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "rednote-hilab/dots.llm1.inst"
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)

messages = [
    {"role": "user", "content": "Write a piece of quicksort code in C++"}
]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=200)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)

📚 Documentation

Model Summary | Property | Details | |----------|---------| | Model Type | A MoE model with 14B activated and 142B total parameters trained on 11.2T tokens | | Training Stages | Pretraining and SFT | | Architecture | Multi - head Attention with QK - Norm in attention Layer, fine - grained MoE utilizing top - 6 out of 128 routed experts, plus 2 shared experts | | Number of Layers | 62 | | Number of Attention Heads | 32 | | Supported Languages | English, Chinese | | Context Length | 32,768 tokens | | License | MIT |
Model Downloads | Model | #Total Params | #Activated Params | Context Length | Download Link | | :------------: | :------------: | :------------: | :------------: | :------------: | | dots.llm1.base | 142B | 14B | 32K | Hugging Face | | dots.llm1.inst | 142B | 14B | 32K | Hugging Face |
Inference with Other Libraries
- vllm: vLLM is a high - throughput and memory - efficient inference and serving engine for LLMs. Official support for this feature is covered in PR #18254.

vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8

- **sglang**: [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI - compatible API service. Official support for this feature is covered in [PR #6471](https://github.com/sgl-project/sglang/pull/6471).

python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000

🔧 Technical Details

Detailed evaluation results are reported in this report.

📄 License

This project is licensed under the MIT License. License Link

📝 Citation

If you find dots.llm1 is useful or want to use it in your projects, please kindly cite our paper:

@article{dots1,
      title={dots.llm1 Technical Report}, 
      author={rednote-hilab},
      journal={arXiv preprint arXiv:TBD},
      year={2025}
}

Hugging Face | Paper
Demo | WeChat | rednote

📢 News

2025.06.06: We released the dots.llm1 series. Check our report for more details!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご