Doge-320M Open-source AI Model - Empowering Sequence and State Conversion Processing, Free and Practical!

Doge 320M

Developed by SmallDoge

Doge is a sequence transformation model that employs dynamic masked attention mechanisms, capable of state transitions using either multi-layer perceptrons or cross-domain mixture of experts.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Dynamic Masked Attention #Small-scale Efficient Training #Multi-domain State Transition

Downloads 3,028

Release Time : 3/10/2025

Model Overview

The Doge model is trained by the SmallDoge community, supporting text generation tasks with dynamic masked attention mechanisms. It uses self-attention during training and state-space mechanisms during inference.

Model Features

Dynamic Masked Attention Mechanism

Allows the Transformer to use self-attention during training and state-space mechanisms during inference.

Cross-domain Mixture of Experts

Can directly inherit weights from multi-layer perceptrons for further training.

Efficient Training

Efficiently trained on RTX 4090 GPUs with relatively short training times.

Model Capabilities

Text Generation

Sequence Transformation

Use Cases

Natural Language Processing

Dialogue Generation

Can be used to generate natural language dialogue responses.

Generates fluent dialogue content

Content Creation

Can assist with writing and content creation.

Generates coherent text content

🚀 Doge 320M

Doge 320M is a language model that uses Dynamic Mask Attention for sequence transformation. It can leverage Multi - Layer Perceptron or Cross Domain Mixture of Experts for state transformation. This model is trained by the SmallDoge community.

🚀 Quick Start

Doge uses Dynamic Mask Attention as sequence transformation and can use Multi - Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self - attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi - Layer Perceptron for further training. This model is trained by SmallDoge community. For detailed algorithm and model architecture, a paper is coming soon. All training details and code are available in the [small - doge](https://github.com/SmallDoges/small - doge) repository.

💻 Usage Examples

Basic Usage

>>> from transformers import AutoTokenizer, AutoModelForCausalLM

>>> tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-320M")
>>> model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-320M", trust_remote_code=True)
>>> inputs = tokenizer("Hey how are you doing?", return_tensors="pt")

>>> out = model.generate(**inputs, max_new_tokens=100)
>>> print(tokenizer.batch_decode(out))

📚 Documentation

Model Details

We build the Doge by doing Per - Training on [Smollm - Corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm - corpus). If you want to continue pre - training this model, you can find the unconverged checkpoint [here](https://huggingface.co/SmallDoge/Doge - 320M - checkpoint). These models have not been fine - tuned for instruction, the instruction model is [here](https://huggingface.co/SmallDoge/Doge - 320M - Instruct).

Pre - Training:

Property	Details
Model	Doge-20M, Doge-60M, Doge-160M, Doge-320M
Training Data	smollm-corpus
Steps	8k, 16k, 24k, 32k
Content Length	2048
Tokens	4B, 16B, 32B, 64B
LR	8e - 3, 6e - 3, 4e - 3, 2e - 3
Batch Size	0.5M, 1M, 1.5M, 2M
Precision	bfloat16
RTX 4090 GPU hours	14, 128, 522, 1856

Evaluation:

Property	Details
Model	Doge-20M, Doge-60M, Doge-160M, Doge-320M
MMLU	25.4, 26.4, 29.2, 35.6
TriviaQA	0.03, 0.2, 4.8, 9.4
ARC	29.8, 37.9, 44.4, 55.4
PIQA	58.4, 61.4, 70.1, 73.9
HellaSwag	27.3, 31.5, 43.4, 52.7
OBQA	25.6, 28.0, 34.4, 37.9
Winogrande	50.2, 50.8, 52.2, 59.3
tokens / s on i7 - 11 CPU	142, 62, 28, 16

Procedure:

Environment:

Image: nvcr.io/nvidia/pytorch:24.12 - py3
Hardware: 1x NVIDIA RTX 4090
Software: Transformers

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citation

@misc{smalldoges,
  title={SmallDoges: A Family of Dynamic UltraFast Small Language Models}, 
  author={Jingze, Shi and Yifan, Wu and Bingheng, Wu and Yuyu, Luo},
  year={2025},
  month={March},
  url={https://github.com/SmallDoges/small-doge}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご