🚀 Doge 160M Reason Distill
Doge 160M Reason Distill模型使用動態掩碼注意力進行序列轉換,可採用多層感知器或跨域專家混合進行狀態轉換。它由SmallDoge社區訓練,能有效處理問答任務。詳細算法和模型架構可參考相關論文,所有訓練細節和代碼均在GitHub倉庫公開。
🚀 快速開始
Doge使用動態掩碼注意力進行序列轉換,並可以使用多層感知器或跨域專家混合作為狀態轉換。動態掩碼注意力允許Transformer在訓練期間使用自注意力,在推理期間使用狀態空間,而跨域專家混合可以直接繼承多層感知器的權重以進行進一步訓練。此模型由SmallDoge社區訓練,有關詳細的算法和模型架構,請參考Wonderful Matrices,所有訓練細節和代碼都可在small-doge倉庫中公開獲取。
💻 使用示例
基礎用法
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-160M-Reason-Distill")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-160M-Reason-Distill", trust_remote_code=True)
generation_config = GenerationConfig(
max_new_tokens=100,
use_cache=True,
do_sample=True,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.0
)
steamer = TextStreamer(
tokenizer=tokenizer,
skip_prompt=True
)
system_prompt = """
Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:
""".strip()
prompt = "Which number is bigger, 3.9 or 3.11?"
conversation = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(
conversation=conversation,
tokenize=True,
return_tensors="pt",
)
outputs = model.generate(
inputs,
tokenizer=tokenizer,
generation_config=generation_config,
streamer=steamer
)
📚 詳細文檔
我們通過在Reason-Distill上進行監督微調(SFT)來構建Doge-Reason-Distill模型。
⚠️ 重要提示
更大的模型正在訓練中,即將上傳。
監督微調(SFT)信息
訓練流程
- 監督微調(SFT):

訓練環境
- 鏡像:nvcr.io/nvidia/pytorch:24.12-py3
- 硬件:1x NVIDIA RTX 4090
- 軟件:Transformers, TRL
📄 許可證
本項目採用Apache-2.0許可證。
📖 引用
如果您在研究中使用了本模型,請使用以下BibTeX引用:
@misc{shi2024wonderfulmatrices,
title={Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture},
author={Jingze Shi and Bingheng Wu},
year={2024},
eprint={2412.11834},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.11834},
}