Doge-160M-Instruct Open-Source Language Model - Free and Efficient Support for Information Interaction and Question Answering

Doge 160M Instruct

Developed by SmallDoge

Doge 160M is a small language model based on dynamic masked attention mechanism, trained with supervised fine-tuning (SFT) and direct preference optimization (DPO).

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Lightweight instruction fine-tuning #Dynamic masked attention #Preference optimization

Downloads 2,223

Release Time : 2/18/2025

Model Overview

Doge employs dynamic masked attention as sequence transformation, and can use multi-layer perceptron or cross-domain mixture of experts for state transformation. This model is suitable for tasks like Q&A and supports English.

Model Features

Dynamic masked attention

Enables Transformer to use self-attention during training and state space during inference

Cross-domain mixture of experts

Can directly inherit weights from multi-layer perceptron for further training

Two-stage training

First supervised fine-tuning (SFT) on SmolTalk, then direct preference optimization (DPO) on UltraFeedback Binarized

Model Capabilities

Text generation

Q&A system

Instruction following

Use Cases

Dialogue system

Daily conversation

Can be used to build chatbots for daily conversations

Q&A system

Knowledge Q&A

Can be used to answer various user questions

🚀 Doge 160M Instruct

Doge 160M Instruct is a model that uses Dynamic Mask Attention for sequence transformation. It can leverage Multi - Layer Perceptron or Cross Domain Mixture of Experts for state transformation. This model is trained by the SmallDoge community, offering potential in question - answering tasks.

🚀 Quick Start

Doge uses Dynamic Mask Attention as sequence transformation and can use Multi - Layer Perceptron or Cross Domain Mixture of Experts as state transformation. Dynamic Mask Attention allows the Transformer to use self - attention during training and state space during inference, and Cross Domain Mixture of Experts can directly inherit the weights of Multi - Layer Perceptron for further training. This model is trained by SmallDoge community, for detailed algorithm and model architecture, paper coming soon, all training details and code are available in the [small - doge](https://github.com/SmallDoges/small - doge) repository.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer

tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-160M-Instruct")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-160M-Instruct", trust_remote_code=True)

generation_config = GenerationConfig(
      max_new_tokens=100, 
      use_cache=True, 
      do_sample=True, 
      temperature=0.8, 
      top_p=0.9,
      repetition_penalty=1.0
)
steamer = TextStreamer(
      tokenizer=tokenizer, 
      skip_prompt=True
)

prompt = "Hi, how are you doing today?"
conversation = [
      {"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(
    conversation=conversation,
    tokenize=True,
    return_tensors="pt",
)

outputs = model.generate(
    inputs, 
    tokenizer=tokenizer,
    generation_config=generation_config, 
    streamer=steamer
)

📚 Documentation

Model Details

We build the Doge - Instruct by first SFT on SmolTalk and then DPO on UltraFeedback Binarized.

SFT:

Model	Training Data	Epochs	Content Length	LR	Batch Size	Precision
[Doge - 20M - Instruct - SFT](https://huggingface.co/SmallDoge/Doge - 20M - Instruct - SFT)	smoltalk	2	2048	8e - 4	0.25M	bfloat16
[Doge - 20M - MoE - Instruct - SFT](https://huggingface.co/SmallDoge/Doge - 20M - MoE - Instruct - SFT)	smoltalk	2	2048	8e - 4	0.25M	bfloat16
[Doge - 60M - Instruct - SFT](https://huggingface.co/SmallDoge/Doge - 60M - Instruct - SFT)	smoltalk	2	2048	6e - 4	0.25M	bfloat16
[Doge - 120M - MoE - Instruct - SFT](https://huggingface.co/SmallDoge/Doge - 120M - MoE - Instruct - SFT)	smoltalk	2	2048	6e - 4	0.25M	bfloat16
[Doge - 160M - Instruct - SFT](https://huggingface.co/SmallDoge/Doge - 160M - Instruct - SFT)	smoltalk	2	2048	4e - 4	0.25M	bfloat16
[Doge - 320M - Instruct - SFT](https://huggingface.co/SmallDoge/Doge - 320M - Instruct - SFT)	smoltalk	2	2048	2e - 4	0.25M	bfloat16

DPO:

Model	Training Data	Epochs	Content Length	LR	Batch Size	Precision
[Doge - 20M - Instruct](https://huggingface.co/SmallDoge/Doge - 20M - Instruct)	ultrafeedback_binarized	2	1024	8e - 5	0.125M	bfloat16
[Doge - 20M - MoE - Instruct](https://huggingface.co/SmallDoge/Doge - 20M - MoE - Instruct)	ultrafeedback_binarized	2	1024	8e - 5	0.125M	bfloat16
[Doge - 60M - Instruct](https://huggingface.co/SmallDoge/Doge - 60M - Instruct)	ultrafeedback_binarized	2	1024	6e - 5	0.125M	bfloat16
[Doge - 120M - MoE - Instruct](https://huggingface.co/SmallDoge/Doge - 120M - MoE - Instruct)	ultrafeedback_binarized	2	1024	6e - 5	0.125M	bfloat16
[Doge - 160M - Instruct](https://huggingface.co/SmallDoge/Doge - 160M - Instruct)	ultrafeedback_binarized	2	1024	4e - 5	0.125M	bfloat16
[Doge - 320M - Instruct](https://huggingface.co/SmallDoge/Doge - 320M - Instruct)	ultrafeedback_binarized	2	1024	2e - 5	0.125M	bfloat16

Evaluation:

Model	IFEval (Prompt Strict Acc)	MMLU	BBH	ARC	PIQA	HellaSwag	tokens / s on i7 - 11 CPU
[Doge - 20M - Instruct](https://huggingface.co/SmallDoge/Doge - 20M - Instruct)	9.2	26.3	18.3	29.2	57.8	27.8	142
[Doge - 20M - MoE - Instruct](https://huggingface.co/SmallDoge/Doge - 20M - MoE - Instruct)	13.7	26.5	26.3	31.1	58.2	27.9	132
[Doge - 60M - Instruct](https://huggingface.co/SmallDoge/Doge - 60M - Instruct)	9.4	27.5	27.7	37.5	61.4	32.1	62
[Doge - 120M - MoE - Instruct](https://huggingface.co/SmallDoge/Doge - 120M - MoE - Instruct)	24.4	28.2	30.1	44.2	62.1	36.3	58
[Doge - 160M - Instruct](https://huggingface.co/SmallDoge/Doge - 160M - Instruct)	16.8	29.7	29.1	42.8	64.1	37.1	28
[Doge - 320M - Instruct](https://huggingface.co/SmallDoge/Doge - 320M - Instruct)	28.5	30.3	31.9	51.7	71.0	50.6	16

Procedure:

SFT:
DPO:

Environment:

Image: nvcr.io/nvidia/pytorch:24.12 - py3
Hardware: 1x NVIDIA RTX 4090
Software: Transformers, TRL

📄 License

This project is licensed under the Apache - 2.0 license. You can find the detailed license information [here](https://github.com/SmallDoges/small - doge/blob/main/LICENSE).

📖 Citation

@misc{smalldoges,
  title={SmallDoges: A Family of Dynamic UltraFast Small Language Models}, 
  author={Jingze, Shi and Yifan, Wu and Bingheng, Wu and Yuyu, Luo},
  year={2025},
  month={March},
  url={https://github.com/SmallDoges/small-doge}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご