๐ DNA-R1
DNA-R1 is a specialized reasoning model optimized for Korean language, based on Microsoft's Phi-4. It significantly enhances Korean reasoning capabilities through large - scale reinforcement learning.
We introduce DNA-R1, a specialized reasoning model optimized for the Korean language and built upon Microsoft's Phi-4. By applying large - scale reinforcement learning (RL) using the same methodology as DeepSeek - R1, we've substantially enhanced the model's Korean reasoning abilities. This model shows a profound understanding of Korean text and outstanding reasoning skills in mathematics, coding, and general reasoning tasks.
๐ Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
tokenizer = AutoTokenizer.from_pretrained('dnotitia/DNA-R1')
model = AutoModelForCausalLM.from_pretrained('dnotitia/DNA-R1', device_map='auto')
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
conversation = [
{"role": "user", "content": """
์ด๋ ค์๋ถํฐ ์ฐ๋ฆฌ ์ง์ ๊ฐ๋ํ์๊ณ
๋จ๋ค ๋คํ๋ ์ธ์ ๋ช ๋ฒ ํ ์ ์ด ์์๊ณ
์ผํฐ์ ๋๊ฐ์ ์ด๋จธ๋ ์ง์ ์์ผ๋ฉด
์ธ์ ๋ ํผ์์ ๋์ฌ ๋จน์๋ ๋ผ๋ฉด
๊ทธ๋ฌ๋ค ๋ผ๋ฉด์ด ๋๋ฌด ์ง๊ฒจ์์
๋ง์๋ ๊ฒ ์ข ๋จน์๊ณ ๋๋ค์์์ด
๊ทธ๋ฌ์ ์ด๋จธ๋์ด ๋ง์ง๋ชปํด ๊บผ๋ด์
์จ๊ฒจ๋์ ๋น์๊ธ์ผ๋ก ์์ผ์ฃผ์
์ง์ฅ๋ฉด ํ๋์ ๋๋ฌด๋ ํ๋ณตํ์์ด
ํ์ง๋ง ์ด๋จธ๋์ ์ ์ง ๋์์ง ์์์ด
์ด๋จธ๋์ ์ง์ฅ๋ฉด์ด ์ซ๋ค๊ณ ํ์
จ์ด
์ด๋จธ๋์ ์ง์ฅ๋ฉด์ด ์ซ๋ค๊ณ ํ์
จ์ด
์ผ์ด์ผ~์ผ ๊ทธ๋ ๊ฒ ์ด์๊ฐ๊ณ
๊ทธ๋ ๊ฒ ํํํ๊ณ ๋๋ฌผ๋ ํ๋ฆฌ๊ณ
์ผ์ด์ผ~์ผ ๊ทธ๋ ๊ฒ ์ด์๊ฐ๊ณ
๋๋ฌด๋ ์ํ๊ณ ํ์ง๋ง ๋ค์ ์๊ณ
---
์น๊ตฌ๊ฐ ์ด ์์ธ๋ฐ, ์ฌ๊ธฐ์ ์น๊ตฌ์ ์ด๋จธ๋๊ฐ ์ง์ฅ๋ฉด์ด ์ซ๋ค๊ณ ํ์ ์ด์ ๋?์ฌ๋orํฌ์?"""},
]
inputs = tokenizer.apply_chat_template(conversation,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt").to(model.device)
_ = model.generate(**inputs, streamer=streamer)
โจ Features
- Korean - optimized Reasoning: Tailored to understand and reason in Korean, with enhanced capabilities in math, coding, and general reasoning.
- Multi - stage Training: Learned reasoning patterns specific to the Korean language through a three - stage training pipeline.
- Advanced Capabilities: Demonstrates self - verification, reflection, and generation of long chains - of - thought (CoT).
๐ฆ Installation
There is no specific installation content provided in the original README. If you want to use the model, you can follow the quick start code to load the model and tokenizer from the pre - trained source.
๐ Documentation
Training Methodology
Our comprehensive training pipeline has three strategic stages:
- Stage 1: Initial SFT with a large Korean non - reasoning dataset (760k examples) reused from our DNA 1.0 8B Instruct training pipeline.
- Stage 2: Strategic integration of Korean reasoning patterns from DeepSeek R1 using a specialized Korean reasoning dataset (300k examples).
- Stage 3: Advanced reinforcement learning with GRPO using a combined Korean/English reasoning dataset, with format, accuracy, and language consistency as rewards.
DNA - R1 has learned reasoning patterns for the Korean language and shows capabilities like self - verification, reflection, and generation of long chains - of - thought (CoT). This is a significant milestone for the AI research community in the Korean language environment.
Model Specifications
Property |
Details |
Developed by |
Dnotitia Inc. |
Supported Languages |
Korean, English |
Model Release Date |
Mar 6, 2025 |
Number of Parameters |
14B |
License |
CC BY - NC 4.0 |
Technical Details
Multi - Stage Training Pipeline
We implemented a sophisticated training approach to enhance Phi - 4's Korean reasoning capabilities:
- Initial Foundation (Stage 1): Supervised Fine - Tuning using our extensive Korean non - reasoning dataset from the established DNA 1.0 8B Instruct training pipeline.
- Reasoning Integration (Stage 2): Specialized adaptation of DeepSeek R1's reasoning patterns with Korean - specific optimization through a meticulously curated dataset.
- Advanced Refinement (Stage 3): Reinforcement learning optimization using GRPO to perfect reasoning in both Korean and English, with comprehensive reward signals for format structure, factual accuracy, and language consistency.
This methodical approach enables DNA - R1 to develop sophisticated chain - of - thought (CoT) reasoning for complex problem solving, resulting in a model finely calibrated for Korean language reasoning while maintaining robust general capabilities.
Performance Highlights
Our Korean - specific multi - stage training pipeline significantly enhances the Phi - 4 base model's understanding of Korean context, reasoning depth, and response capabilities. The model excels at:
- Generating nuanced Korean chains - of - thought (CoT).
- Performing rigorous self - verification.
- Solving multi - step complex problems.
- Maintaining cultural and linguistic context in reasoning.
- Distinguishing between deep thinking and concise answers using the
<think>
and <answer>
tags.
Evaluation Results
Below are our evaluation results for the DNA - R1 model across math, coding, science, Korean, and general - performance benchmarks. Despite being only 14B in size, the DNA - R1 model shows superior performance compared to many larger models across various benchmarks.
Benchmark |
Task |
DNA - R1 (14B) |
DeepSeek - R1 - Distill - Qwen - 14B |
DeepSeek - R1 - Distill - Qwen - 32B |
EXAONE - 3.5 - 32B - Instruct |
QwQ - 32B - Preview |
gpt - 4o - 0513 |
o1 - mini |
o1 - preview |
GSM8K |
Math |
92.49 |
88.63 |
82.64 |
91.9 |
82.41 |
- |
- |
- |
Math500 |
Math |
89.4 |
88.2 |
87.4 |
75.8 |
92.2 |
75.8 |
85.6 |
81.4 |
AIME2024 |
Math |
53.3 |
69.7 |
72.6 |
6.67 |
50.0 |
8.6 |
64.0 |
40 |
OlympiadBench (Math, EN) |
Math |
59.94 |
56.82 |
55.34 |
38.58 |
62.17 |
- |
- |
59.2 |
GPQA - Diamond |
Science/Reasoning |
61.11 |
59.1 |
58.08 |
33.33 |
52.5 |
46.5 |
60 |
75.2 |
LiveCodeBench |
Coding |
50.58 |
59.88 |
61.65 |
19.8 |
59.12 |
50.48 |
72.75 |
59.14 |
KMMLU - direct |
Korean |
59.9 |
50.5 |
58.62 |
50.72 |
62.96 |
- |
- |
- |
KMMLU - hard |
Korean |
36.65 |
25.34 |
33.67 |
25.46 |
37.98 |
- |
- |
- |
KoBEST |
Korean |
83.05 |
74.32 |
78.53 |
86.54 |
85.93 |
- |
- |
- |
MMLU - Pro |
General |
57.64 |
50.55 |
59.58 |
- |
46.82 |
- |
- |
- |
- The highest scores are in bold form, and the second - highest scores are underlined.
- All benchmarks are evaluated with [lm - eval](https://github.com/EleutherAI/lm - evaluation - harness) and [skythought - eval](https://github.com/NovaSky - AI/SkyThought/tree/main/skythought/evals).
๐ License
This model is released under the CC BY - NC 4.0 license. If you have any questions or commercial usage inquiries, please [Contact us](https://www.dnotitia.com/contact/post - form).
Citation
If you use or discuss this model in your academic research, please cite the project to help spread awareness:
@misc{dnar12025,
title={DNA R1},
author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/dnotitia/DNA-R1}
}