D

Doge 160M Reason Distill

Developed by SmallDoge
Doge 160M Reasoning Distilled Version is a lightweight language model based on dynamic masked attention mechanism and cross-domain mixture of experts, focusing on reasoning and question-answering tasks.
Downloads 26
Release Time : 2/18/2025

Model Overview

This model employs dynamic masked attention mechanism for sequence transformation and can optionally use multi-layer perceptron or cross-domain mixture of experts for state transition. The dynamic masked attention mechanism enables the Transformer to use self-attention during training and switch to state-space mechanism during inference.

Model Features

Dynamic Masked Attention Mechanism
Allows the use of self-attention during training and switching to state-space mechanism during inference, improving inference efficiency.
Cross-domain Mixture of Experts
Can directly inherit weights from multi-layer perceptron for subsequent training, enhancing model adaptability.
Reasoning Distillation
Supervised fine-tuning on the Reason-Distill dataset to optimize reasoning capabilities.

Model Capabilities

Question Answering Generation
Logical Reasoning
Mathematical Problem Solving

Use Cases

Education
Mathematical Problem Solving
Solving basic mathematical comparison and calculation problems
Can correctly compare number sizes and provide reasoning processes
Intelligent Assistant
Systematic Problem Solving
Providing detailed thinking processes and solutions in a specific format
Can generate structured thinking processes and final solutions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase