DNA-R1 Open-Source Korean Reasoning Model - Enhanced Ability through Reinforcement Learning, Excellent in Both Mathematical and Programming Reasoning

DNA R1

Developed by dnotitia

DNA-R1 is a Korean-specialized reasoning model optimized from Microsoft's Phi-4, significantly enhancing Korean reasoning capabilities through reinforcement learning, excelling in mathematics, programming, and general reasoning tasks.

Large Language Model

Transformers

Supports Multiple Languages#Korean Reasoning Optimization #Chain-of-Thought Generation #Math & Programming Excellence

Downloads 1,943

Release Time : 4/25/2025

Model Overview

DNA-R1 is a reasoning model specifically optimized for Korean, demonstrating profound understanding of Korean text and exhibiting exceptional reasoning abilities in mathematics, programming, and general reasoning tasks.

Model Features

Korean-Specialized Reasoning Optimization

Specifically optimizes Korean reasoning capabilities through a multi-stage training process, including initial supervised fine-tuning, reasoning pattern integration, and GRPO reinforcement learning.

Chain-of-Thought Reasoning

Capable of generating detailed Korean chain-of-thought (CoT), performing self-verification, and solving multi-step complex problems.

Cultural Context Understanding

Maintains Korean cultural and linguistic context in reasoning, using special tags to distinguish between deep thinking and concise responses.

Model Capabilities

Korean Text Understanding

Mathematical Reasoning

Programming Problem Solving

Scientific Reasoning

General Question Answering

Chain-of-Thought Generation

Self-Verification

Use Cases

Education

Math Problem Solving

Solving complex mathematical problems and Olympiad competition questions

Achieved a score of 92.49 on the GSM8K benchmark

Korean Poetry Analysis

Understanding and analyzing implicit meanings and cultural contexts in Korean poetry

Technology

Programming Problem Solving

Understanding and solving programming problems and algorithm challenges

Achieved a score of 50.58 on the LiveCodeBench benchmark

🚀 DNA-R1

DNA-R1 is a specialized reasoning model optimized for Korean language, based on Microsoft's Phi-4. It significantly enhances Korean reasoning capabilities through large - scale reinforcement learning.

We introduce DNA-R1, a specialized reasoning model optimized for the Korean language and built upon Microsoft's Phi-4. By applying large - scale reinforcement learning (RL) using the same methodology as DeepSeek - R1, we've substantially enhanced the model's Korean reasoning abilities. This model shows a profound understanding of Korean text and outstanding reasoning skills in mathematics, coding, and general reasoning tasks.

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

tokenizer = AutoTokenizer.from_pretrained('dnotitia/DNA-R1')
model = AutoModelForCausalLM.from_pretrained('dnotitia/DNA-R1', device_map='auto')
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

conversation = [
    {"role": "user", "content": """
어려서부터 우리 집은 가난했었고
남들 다하는 외식 몇 번 한 적이 없었고
일터에 나가신 어머니 집에 없으면
언제나 혼자서 끓여 먹었던 라면
그러다 라면이 너무 지겨워서
맛있는 것 좀 먹자고 대들었었어
그러자 어머님이 마지못해 꺼내신
숨겨두신 비상금으로 시켜주신
짜장면 하나에 너무나 행복했었어
하지만 어머님은 왠지 드시질 않았어
어머님은 짜장면이 싫다고 하셨어
어머님은 짜장면이 싫다고 하셨어
야이야~야 그렇게 살아가고
그렇게 후회하고 눈물도 흘리고
야이야~야 그렇게 살아가고
너무나 아프고 하지만 다시 웃고
---
친구가 쓴 시인데, 여기서 친구의 어머니가 짜장면이 싫다고 하신 이유는?사랑or희생?"""},
]
inputs = tokenizer.apply_chat_template(conversation,
                                       add_generation_prompt=True,
                                       return_dict=True,
                                       return_tensors="pt").to(model.device)
_ = model.generate(**inputs, streamer=streamer)

✨ Features

Korean - optimized Reasoning: Tailored to understand and reason in Korean, with enhanced capabilities in math, coding, and general reasoning.
Multi - stage Training: Learned reasoning patterns specific to the Korean language through a three - stage training pipeline.
Advanced Capabilities: Demonstrates self - verification, reflection, and generation of long chains - of - thought (CoT).

📦 Installation

There is no specific installation content provided in the original README. If you want to use the model, you can follow the quick start code to load the model and tokenizer from the pre - trained source.

📚 Documentation

Training Methodology

Our comprehensive training pipeline has three strategic stages:

Stage 1: Initial SFT with a large Korean non - reasoning dataset (760k examples) reused from our DNA 1.0 8B Instruct training pipeline.
Stage 2: Strategic integration of Korean reasoning patterns from DeepSeek R1 using a specialized Korean reasoning dataset (300k examples).
Stage 3: Advanced reinforcement learning with GRPO using a combined Korean/English reasoning dataset, with format, accuracy, and language consistency as rewards.

DNA - R1 has learned reasoning patterns for the Korean language and shows capabilities like self - verification, reflection, and generation of long chains - of - thought (CoT). This is a significant milestone for the AI research community in the Korean language environment.

Model Specifications

Property	Details
Developed by	Dnotitia Inc.
Supported Languages	Korean, English
Model Release Date	Mar 6, 2025
Number of Parameters	14B
License	CC BY - NC 4.0

Technical Details

Multi - Stage Training Pipeline

We implemented a sophisticated training approach to enhance Phi - 4's Korean reasoning capabilities:

Initial Foundation (Stage 1): Supervised Fine - Tuning using our extensive Korean non - reasoning dataset from the established DNA 1.0 8B Instruct training pipeline.
Reasoning Integration (Stage 2): Specialized adaptation of DeepSeek R1's reasoning patterns with Korean - specific optimization through a meticulously curated dataset.
Advanced Refinement (Stage 3): Reinforcement learning optimization using GRPO to perfect reasoning in both Korean and English, with comprehensive reward signals for format structure, factual accuracy, and language consistency.

This methodical approach enables DNA - R1 to develop sophisticated chain - of - thought (CoT) reasoning for complex problem solving, resulting in a model finely calibrated for Korean language reasoning while maintaining robust general capabilities.

Performance Highlights

Our Korean - specific multi - stage training pipeline significantly enhances the Phi - 4 base model's understanding of Korean context, reasoning depth, and response capabilities. The model excels at:

Generating nuanced Korean chains - of - thought (CoT).
Performing rigorous self - verification.
Solving multi - step complex problems.
Maintaining cultural and linguistic context in reasoning.
Distinguishing between deep thinking and concise answers using the <think> and <answer> tags.

Evaluation Results

Below are our evaluation results for the DNA - R1 model across math, coding, science, Korean, and general - performance benchmarks. Despite being only 14B in size, the DNA - R1 model shows superior performance compared to many larger models across various benchmarks.

Benchmark	Task	DNA - R1 (14B)	DeepSeek - R1 - Distill - Qwen - 14B	DeepSeek - R1 - Distill - Qwen - 32B	EXAONE - 3.5 - 32B - Instruct	QwQ - 32B - Preview	gpt - 4o - 0513	o1 - mini	o1 - preview
GSM8K	Math	92.49	88.63	82.64	91.9	82.41	-	-	-
Math500	Math	89.4	88.2	87.4	75.8	92.2	75.8	85.6	81.4
AIME2024	Math	53.3	69.7	72.6	6.67	50.0	8.6	64.0	40
OlympiadBench (Math, EN)	Math	59.94	56.82	55.34	38.58	62.17	-	-	59.2
GPQA - Diamond	Science/Reasoning	61.11	59.1	58.08	33.33	52.5	46.5	60	75.2
LiveCodeBench	Coding	50.58	59.88	61.65	19.8	59.12	50.48	72.75	59.14
KMMLU - direct	Korean	59.9	50.5	58.62	50.72	62.96	-	-	-
KMMLU - hard	Korean	36.65	25.34	33.67	25.46	37.98	-	-	-
KoBEST	Korean	83.05	74.32	78.53	86.54	85.93	-	-	-
MMLU - Pro	General	57.64	50.55	59.58	-	46.82	-	-	-

The highest scores are in bold form, and the second - highest scores are underlined.
All benchmarks are evaluated with [lm - eval](https://github.com/EleutherAI/lm - evaluation - harness) and [skythought - eval](https://github.com/NovaSky - AI/SkyThought/tree/main/skythought/evals).

📄 License

This model is released under the CC BY - NC 4.0 license. If you have any questions or commercial usage inquiries, please [Contact us](https://www.dnotitia.com/contact/post - form).

Citation

If you use or discuss this model in your academic research, please cite the project to help spread awareness:

@misc{dnar12025,
      title={DNA R1}, 
      author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
      year={2025},
      publisher={HuggingFace},
      url={https://huggingface.co/dnotitia/DNA-R1}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご