Doge 20M Instruct
Doge 20M is a small language model based on dynamic masked attention mechanism, supporting instruction following and Q&A tasks.
Downloads 5,010
Release Time : 12/14/2024
Model Overview
Doge employs dynamic masked attention for sequence transformation and can use multi-layer perceptrons or cross-domain mixture of experts for state transitions. The model underwent supervised fine-tuning (SFT) on the SmolTalk dataset, followed by direct preference optimization (DPO) training on the UltraFeedback Binarized dataset.
Model Features
Dynamic Masked Attention Mechanism
Enables Transformer to use self-attention during training and state space during inference
Cross-domain Mixture of Experts
Can directly inherit weights from multi-layer perceptrons for further training
Efficient Inference
Achieves 142 tokens/sec inference speed on an i7-11th gen CPU
Model Capabilities
Instruction Following
Question Answering
Text Generation
Use Cases
Dialogue Systems
Daily Conversations
Used for building chatbots for daily conversations
Q&A Systems
Knowledge Q&A
Used to answer various user questions
Featured Recommended AI Models
Š 2025AIbase