Qwen2.5-Coder-1.5B-Instruct-Reasoning開源代碼推理模型 - 提升編程問題解決與邏輯推理能力

首頁

Qwen2.5 Coder 1.5B Instruct Reasoning

由bunyaminergen開發

基於Qwen2.5-Coder-1.5B-Instruct微調的代碼推理模型，增強了編程問題解決和邏輯推理能力

大型語言模型

Safetensors

支持多種語言開源協議:Gpl-3.0 #代碼推理增強 #Python腳本生成 #思維鏈標記

下載量 130

發布時間 : 4/27/2025

模型概述

該模型在nvidia/OpenCodeReasoning數據集上進行了監督微調，特別強化了代碼生成和邏輯推理能力，適合作為編程助手使用

模型特點

增強的代碼推理能力

通過特殊標記<think></think>展示推理過程，使代碼生成更具邏輯性

編程問題解決

能夠理解編程問題需求並生成正確的代碼解決方案

高效微調

使用QLoRA方法進行高效微調，保持基礎模型能力的同時增強特定功能

模型能力

代碼生成

編程問題解答

邏輯推理

代碼解釋

算法實現

使用案例

編程教育

編程練習輔導

幫助學生理解編程概念並生成示例代碼

提供帶有詳細解釋的正確代碼實現

開發輔助

算法實現

根據需求生成常見算法的實現代碼

如質數檢查、排序算法等標準實現

🚀 Qwen2.5-Coder-1.5B-Instruct-Reasoning

Qwen2.5-Coder-1.5B-Instruct-Reasoning 模型在 nvidia/OpenCodeReasoning 數據集上進行了監督微調（SFT），以增強其推理能力。

🚀 快速開始

Hugging Face使用示例

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_name = "Qwen/Qwen2.5-Coder-1.5B-Instruct"
adapter_repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-Reasoning"

tokenizer = AutoTokenizer.from_pretrained(adapter_repo, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype="auto",
)

model.resize_token_embeddings(len(tokenizer))

model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Please provide a Python script that checks whether a number is prime."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=2048
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

✨ 主要特性

通過在 nvidia/OpenCodeReasoning 數據集上進行監督微調，模型增強了推理能力。在處理代碼相關任務時，能夠展現出更強大的邏輯推理和問題解決能力。

💻 使用示例

基礎用法

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_name = "Qwen/Qwen2.5-Coder-1.5B-Instruct"
adapter_repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-Reasoning"

tokenizer = AutoTokenizer.from_pretrained(adapter_repo, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype="auto",
)

model.resize_token_embeddings(len(tokenizer))

model = PeftModel.from_pretrained(model, adapter_repo)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Please provide a Python script that checks whether a number is prime."}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=2048
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

高級用法

在實際應用中，你可以根據具體需求調整輸入的消息內容和生成參數，以滿足不同的任務要求。例如，修改 max_new_tokens 參數來控制生成文本的長度。

📚 詳細文檔

對比

請關注推理模型中的 <think> </think> 特殊標記。可以看到，推理模型 現在能夠進行 “思考”。

基礎模型

`System:` You are a helpful coding assistant.
`User:` Please provide a Python script that checks whether a number is prime.
`Assistant:`
```python
def check_prime(num):
    # Check if the number is less than 2
    if num < 2:
        return False
    # Check for factors from 2 to the square root of the number
    for i in range(2, int(num**0.5) + 1):
        if num % i == 0:
            return False
    # If no factors were found, the number is prime
    return True

# Example usage
number = 7
if check_prime(number):
    print(f"{number} is a prime number.")
else:
    print(f"{number} is not a prime number.")

在這個Python腳本中，check_prime 函數接受一個整數作為輸入，如果該數是質數則返回 True，否則返回 False。函數首先檢查該數是否小於2，因為小於2的數不是質數。然後，它從2開始迭代到該數的平方根（包含）。如果在這個範圍內找到該數的任何因數，則函數立即返回 False，表示該數不是質數。如果在檢查所有可能的除數後沒有找到因數，則函數得出該數是質數並返回 True。

示例用法展示瞭如何使用 check_prime 函數檢查數字7是否為質數。輸出將指示7是否為質數。

推理模型

`System:` You are a helpful coding assistant.
`User:` Please provide a Python script that checks whether a number is prime.
`Assistant:`
`<think>`
Okay, I need to write a Python program that checks if a given number is prime. Hmm, what's a prime number? Oh right, a
prime number has exactly two distinct positive divisors: 1 and itself. So for a number to be prime, it must not have any
divisors other than those two.

First, let's think about the steps involved. The first thing is to determine if there are any divisors other than 1 and
the number itself. How do I check that?

Well, for a number n, starting from 2 up to sqrt(n), check if any of them divide n. If any do, then it's not prime. But
wait, maybe even before checking up to the square root, some numbers might have factors beyond that. For example, take

9. The square root is 3. So checking 2,3,4 would catch 9's factors. Wait, but 3 divides 9, so the loop from 2 to sqrt(
   n) (which is 3 in this case) will check 2 and 3. So yes, checking up to the square root is sufficient because any
   composite number has at least one factor less than or equal to its square root.

So the plan is:

1. Read an integer from input.
2. Check if the number is less than 2. If so, it's not prime.
3. Otherwise, check from 2 up to the square root of the number. For each i, if the number mod i is zero, return False.
4. If none of the numbers in this range divide the number, return True.

Wait, but what about 0 and 1? Well, according to the problem statement, the input could be 0 or 1, which are not primes.
So we should handle those cases separately.

Let me outline the code structure.

Read the input number as an integer. Let's call it num.

Handle edge cases first:

- If num < 2: print "No" (since primes are >=2)
  Else:
  Compute sqrt_num = int(math.sqrt(num))
  Then loop from 2 to sqrt_num inclusive. For each i, check if num % i ==0. If any such i exists, return False. Else,
  after checking all, return True.

But how to compute the square root accurately in Python? Using math.isqrt for integers, which returns the floor of the
square root. That's better than using ** operator since it avoids floating point inaccuracies.

Wait, but what if the number is a perfect square? Like 9. Then math.isqrt(9) is 3. So in that case, the loop would check
up to 3. But when i reaches 3, 9%3 is 0, so it would return False, which is correct.

So the code would look something like this:

import math

n = int(input())
if n < 2:
print("No")
else:
sqrt_n = math.isqrt(n)
is_prime = True
for i in range(2, sqrt_n +1):
if n % i == 0:
is_prime = False
break
print("Yes" if is_prime else "No")

Wait, but wait, what about numbers like 25. The square root is 5. So the loop runs from 2 to 5. When i=5, 25%5 is 0, so
returns false. Correct.

This logic seems solid. Let me test some examples.

Test case 1: input 7. The loop runs up to 2. 2 to 2. Check 2. 7%2 is 1, so returns true. Output Yes.

Test case 2: input 6. sqrt(6) is ~2.449. So sqrt_n is 2. Loop from 2 to 2. Check 2. 6%2 is 0. So returns false. Output
No.

Another test case: 8. sqrt(8) is ~2.828. So sqrt_n is 2. Check 2. 8%2 is 0. So output No.

What about a prime number like 13. sqrt(13) is ~3.605. So loop up to 3. Check 2 and 3. 13%2 is 1, 13%3 is 1. So returns
true. Correct.

Edge case: 1. Output No.

Another edge case: 2. Output Yes.

What about a very large number, say 999983 (a prime). The code should check up to sqrt(999983) which is around 999.9. So
the loop runs 2 to 999. It will check all possible factors. Since it's a prime, none will divide it. So returns Yes.

So the code seems to handle all cases correctly.

I think that's the solution. Now, writing the code accordingly.
`</think>`

```python
import math

n = int(input())
if n < 2:
    print("No")
else:
    sqrt_n = math.isqrt(n)
    is_prime = True
    for i in range(2, sqrt_n + 1):
        if n % i == 0:
            is_prime = False
            break
    print("Yes" if is_prime else "No")

數據集

nvidia/OpenCodeReasoning

訓練

基礎參數

參數	詳情
基礎模型	`Qwen/Qwen2.5-Coder-1.5B-Instruct`
微調方法	QLoRA
任務類型	`CAUSAL_LM`
訓練輪數	`3`
批次大小	`1`
梯度累積步數	`1`
有效批次大小	`1`
學習率	`2e-4`
學習率調度器類型	`cosine`
預熱比例	`0.05`
精度	`FP16 混合精度`
梯度檢查點	`True`
僅完成損失	`True`
打包	`False`
最大序列長度	`8192 tokens`
日誌記錄步數	每 `10000` 步
保存檢查點步數	每 `10000` 步
輸出目錄	`.model`

PEFT/LoRA參數

參數	詳情
LoRA 秩 (`r`)	`16`
LoRA 阿爾法	`32`
LoRA 丟棄率	`0.05`
LoRA 偏置	`none`
任務類型	`CAUSAL_LM`
目標模塊	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
保存模塊	`embed_tokens`, `lm_head`

模型參數

參數	詳情
名稱	`Qwen/Qwen2.5-Coder-1.5B-Instruct`
注意力實現	`flash_attention_2`
以4位加載	`true`
4位量化類型	`nf4`
4位雙量化	`true`

數據集參數

參數	詳情
數據集名稱	`nvidia/OpenCodeReasoning`
分割	`split_0`
行數	`8000`
最大令牌長度	`8192`
打亂	`True`
進程數	`4`

分詞器參數

參數	詳情
截斷	啟用 (`max_length=8192`)
掩碼語言模型 (MLM)	`False`

速度、大小、時間

參數	詳情
總訓練時間	~3.5 小時
檢查點頻率	每 `10000` 步
檢查點步數	`checkpoint-10000`, `checkpoint-20000`, `checkpoint-24000`

計算基礎設施

參數	詳情
GPU	1 × NVIDIA H100 SXM (80 GB VRAM)
內存	125 GB
CPU	16 vCPU
操作系統	Ubuntu 22.04
框架	PyTorch 2.4.0
CUDA 版本	12.4.1

🔧 技術細節

該模型基於 Qwen/Qwen2.5-Coder-1.5B-Instruct 基礎模型，使用 QLoRA 方法進行微調。通過在 nvidia/OpenCodeReasoning 數據集上進行監督微調，模型學習到了更多的代碼推理知識，從而增強了其推理能力。在訓練過程中，採用了一系列的優化策略，如混合精度訓練、梯度檢查點等，以提高訓練效率和模型性能。

📄 許可證

LICENSE

🔗 相關鏈接

👥 團隊

Bunyamin Ergen

📞 聯繫我們

郵箱

📚 參考資料

該模型是在原始模型 Qwen/Qwen2.5-Coder-1.5B-Instruct 的基礎上，使用監督微調（SFT）方法進行微調得到的。

📝 引用

@software{       Qwen2.5-Coder-1.5B-Instruct-Reasoning,
  author       = {Bunyamin Ergen},
  title        = {{Qwen2.5-Coder-1.5B-Instruct-Reasoning}},
  year         = {2025},
  month        = {04},
}