D

Doge 160M Instruct

Developed by SmallDoge
Doge 160M is a small language model based on dynamic masked attention mechanism, trained with supervised fine-tuning (SFT) and direct preference optimization (DPO).
Downloads 2,223
Release Time : 2/18/2025

Model Overview

Doge employs dynamic masked attention as sequence transformation, and can use multi-layer perceptron or cross-domain mixture of experts for state transformation. This model is suitable for tasks like Q&A and supports English.

Model Features

Dynamic masked attention
Enables Transformer to use self-attention during training and state space during inference
Cross-domain mixture of experts
Can directly inherit weights from multi-layer perceptron for further training
Two-stage training
First supervised fine-tuning (SFT) on SmolTalk, then direct preference optimization (DPO) on UltraFeedback Binarized

Model Capabilities

Text generation
Q&A system
Instruction following

Use Cases

Dialogue system
Daily conversation
Can be used to build chatbots for daily conversations
Q&A system
Knowledge Q&A
Can be used to answer various user questions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase