D

Doge 160M

Developed by SmallDoge
Doge 160M is a small language model that employs dynamic masked attention mechanisms, trained by the SmallDoge community, and supports text generation tasks.
Downloads 4,227
Release Time : 2/15/2025

Model Overview

Doge 160M is a small language model based on the Transformer architecture, utilizing dynamic masked attention mechanisms for sequence transformation and employing multi-layer perceptrons or cross-domain mixture of experts for state transitions. The model is suitable for text generation tasks and performs excellently in multiple benchmark tests.

Model Features

Dynamic Masked Attention Mechanism
Enables the Transformer to use self-attention mechanisms during training and state space during inference, improving efficiency.
Cross-Domain Mixture of Experts
Can directly inherit weights from multi-layer perceptrons for further training, enhancing model performance.
Efficient Training
Training completes in just 522 hours on an RTX 4090 GPU, making it suitable for resource-limited environments.

Model Capabilities

Text Generation
Natural Language Processing

Use Cases

Text Generation
Dialogue Generation
Used to generate natural dialogue responses.
Performs well on benchmarks such as TriviaQA and HellaSwag.
Content Creation
Used to generate short text content, such as social media posts or brief articles.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase