🚀 Doge 160M Reason Distill
Doge 160M Reason Distill模型使用动态掩码注意力进行序列转换,可采用多层感知器或跨域专家混合进行状态转换。它由SmallDoge社区训练,能有效处理问答任务。详细算法和模型架构可参考相关论文,所有训练细节和代码均在GitHub仓库公开。
🚀 快速开始
Doge使用动态掩码注意力进行序列转换,并可以使用多层感知器或跨域专家混合作为状态转换。动态掩码注意力允许Transformer在训练期间使用自注意力,在推理期间使用状态空间,而跨域专家混合可以直接继承多层感知器的权重以进行进一步训练。此模型由SmallDoge社区训练,有关详细的算法和模型架构,请参考Wonderful Matrices,所有训练细节和代码都可在small-doge仓库中公开获取。
💻 使用示例
基础用法
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-160M-Reason-Distill")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-160M-Reason-Distill", trust_remote_code=True)
generation_config = GenerationConfig(
max_new_tokens=100,
use_cache=True,
do_sample=True,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.0
)
steamer = TextStreamer(
tokenizer=tokenizer,
skip_prompt=True
)
system_prompt = """
Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:
""".strip()
prompt = "Which number is bigger, 3.9 or 3.11?"
conversation = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(
conversation=conversation,
tokenize=True,
return_tensors="pt",
)
outputs = model.generate(
inputs,
tokenizer=tokenizer,
generation_config=generation_config,
streamer=steamer
)
📚 详细文档
我们通过在Reason-Distill上进行监督微调(SFT)来构建Doge-Reason-Distill模型。
⚠️ 重要提示
更大的模型正在训练中,即将上传。
监督微调(SFT)信息
训练流程
- 监督微调(SFT):

训练环境
- 镜像:nvcr.io/nvidia/pytorch:24.12-py3
- 硬件:1x NVIDIA RTX 4090
- 软件:Transformers, TRL
📄 许可证
本项目采用Apache-2.0许可证。
📖 引用
如果您在研究中使用了本模型,请使用以下BibTeX引用:
@misc{shi2024wonderfulmatrices,
title={Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture},
author={Jingze Shi and Bingheng Wu},
year={2024},
eprint={2412.11834},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.11834},
}