Diraya-3B-Instruct-Ar開源阿拉伯語推理模型 - 提升邏輯推理與數學解題能力

首頁

Diraya 3B Instruct Ar

由Omartificial-Intelligence-Space開發

基於Qwen2.5-3B微調的阿拉伯語推理專用語言模型，專注於提升阿拉伯語語言模型在邏輯推理和數學解題方面的能力。

大型語言模型

Transformers

阿拉伯語開源協議:Apache-2.0 #阿拉伯語推理 #結構化XML輸出 #數學解題

下載量 86

發布時間 : 3/15/2025

模型概述

迪拉亞-3B-阿拉伯語指令模型屬於DIRA（迪拉亞阿拉伯語推理AI）系列，專為阿拉伯語複雜推理任務優化，採用結構化XML格式輸出推理過程，增強多步驟數學問題求解能力。

模型特點

阿拉伯語優先推理

專為阿拉伯語複雜推理任務優化

結構化推理格式

訓練輸出清晰XML格式的推理過程

數學推理能力

增強的多步驟數學問題求解能力

指令調優

可靠遵循阿拉伯語指令

輕量化

基於高效的30億參數架構

模型能力

阿拉伯語文本生成

數學推理

邏輯推理

指令遵循

結構化輸出

使用案例

教育

數學問題求解

解決阿拉伯語小學數學問題，提供分步推理過程

生成結構化XML格式的推理步驟和最終答案

研究

阿拉伯語NLP研究

用於阿拉伯語語言模型的推理能力評估

🚀 Diraya-3B-Instruct-Ar

Diraya-3B-Instruct-Ar 是一個專門用於阿拉伯語推理的語言模型，它基於 Qwen2.5-3B 進行微調。該模型屬於 DIRA（Diraya Arabic Reasoning AI） 系列，專注於提升阿拉伯語語言模型的邏輯推理和數學推理能力。

🚀 快速開始

安裝依賴

首先，確保你已經安裝了所需的庫：

pip install transformers peft vLLM unsloth

代碼示例

from unsloth import FastLanguageModel

max_seq_length = 1024 # Can increase for longer reasoning traces
lora_rank = 64 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Omartificial-Intelligence-Space/Diraya-3B-Instruct-Ar",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
)

# System prompt to enforce XML structure
system_prompt = """
Respond in the following format in Arabic language only:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

# Prepare user question
user_question = "كل يوم، تُطعم وندي كل دجاجة من دجاجاتها ثلاث أكواب من العلف المختلط. تقدم الدجاجات وجباتهم في ثلاث وجبات منفصلة. في الصباح، تعطي قطيعها من الدجاج 15 كوبًا من العلف. في فترة ما بعد الظهر، تعطي دجاجاتها 25 كوبًا أخرى من العلف. كم عدد أكواب العلف التي تحتاجها لتقديمها لدجاجاتها في الوجبة الأخيرة من اليوم إذا كان حجم قطيع وندي 20 دجاجة؟"

# Prepare input for the model
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_question}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate response
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

✨ 主要特性

以阿拉伯語推理為核心：專門針對阿拉伯語的複雜推理任務進行優化。
結構化推理格式：經過訓練，能夠以清晰的 XML 格式輸出推理過程。
數學推理能力：增強了解決多步數學問題的能力。
指令遵循性：能夠可靠地遵循阿拉伯語指令。
輕量級模型：基於高效的 3B 參數模型架構。

📦 安裝指南

pip install transformers peft vLLM unsloth

💻 使用示例

基礎用法

from unsloth import FastLanguageModel

max_seq_length = 1024 # Can increase for longer reasoning traces
lora_rank = 64 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Omartificial-Intelligence-Space/Diraya-3B-Instruct-Ar",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
)

# System prompt to enforce XML structure
system_prompt = """
Respond in the following format in Arabic language only:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

# Prepare user question
user_question = "كل يوم، تُطعم وندي كل دجاجة من دجاجاتها ثلاث أكواب من العلف المختلط. تقدم الدجاجات وجباتهم في ثلاث وجبات منفصلة. في الصباح، تعطي قطيعها من الدجاج 15 كوبًا من العلف. في فترة ما بعد الظهر، تعطي دجاجاتها 25 كوبًا أخرى من العلف. كم عدد أكواب العلف التي تحتاجها لتقديمها لدجاجاتها في الوجبة الأخيرة من اليوم إذا كان حجم قطيع وندي 20 دجاجة؟"

# Prepare input for the model
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_question}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate response
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

📚 詳細文檔

模型描述

Diraya-3B-Instruct-Ar 是一個從 Qwen2.5-3B 微調而來的阿拉伯語推理專用語言模型。該模型是 DIRA（Diraya Arabic Reasoning AI） 集合的一部分，專注於提升阿拉伯語語言模型的邏輯推理和數學推理能力。

技術細節

屬性	詳情
基礎模型	Qwen2.5-3B via unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit
模型類型	指令微調的因果語言模型
架構	36 個 Transformer 層；查詢使用 16 個注意力頭（GQA）；鍵/值使用 2 個注意力頭；上下文長度：32,768 個標記
訓練方法	使用 `GPRO` 進行微調；訓練重點是使用 XML 標籤的結構化推理輸出格式；使用阿拉伯語 GSM8K 數據集優化數學推理能力；使用多個獎勵函數，包括正確性、格式遵循和輸出結構
LoRA 配置

{
  "peft_type": "LORA",
  "r": 64,
  "lora_alpha": 64,
  "lora_dropout": 0,
  "target_modules": [
    "k_proj", "gate_proj", "o_proj", "down_proj", 
    "v_proj", "up_proj", "q_proj"
  ],
  "bias": "none",
  "inference_mode": true
}
``` |

### 訓練數據
該模型主要在以下數據集上進行微調：
- [**阿拉伯語 GSM8K 數據集**](https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-gsm8k)：一個全面的小學算術問題集合，已翻譯成阿拉伯語，需要多步推理。

### 訓練和評估結果
![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/21ZC26E-3Zh2Xw3a6h3WL.png)

*圖：訓練步驟中的獎勵組成，展示了模型在不同獎勵函數下的性能演變*

訓練過程中使用了多個獎勵函數來優化模型的不同方面的性能：
- **正確性獎勵**（紅色）：衡量模型生成正確最終答案的能力。
- **整數獎勵**（藍色）：確保模型輸出有效的數值響應。
- **格式獎勵**（紫色/灰色）：促進遵循正確的 XML 結構。
- **XML 計數獎勵**（黃色）：微調 XML 標籤的精確放置和完整性。

如圖所示，模型在整個訓練過程中在所有獎勵維度上都表現出持續的改進。獎勵值越高，表示輸出質量越高，同時滿足多個優化標準。這種多目標訓練方法使得模型不僅能夠產生正確的答案，而且能夠以清晰、結構化的推理方式呈現。

該模型在阿拉伯語數學推理任務中表現出色，尤其在以下方面具有優勢：
- 生成結構良好的推理步驟。
- 遵循所需的 XML 輸出格式。
- 為多步問題得出正確的數值答案。

### 引用
如果您在研究中使用了該模型，請引用：
```bibtex
@misc{diraya3b,
  title={Diraya-3B-Instruct-Ar: An Arabic Reasoning-Specialized Language Model},
  author={Omartificial-Intelligence-Space},
  year={2025},
  howpublished={\url{https://huggingface.co/Omartificial-Intelligence-Space/Diraya-3B-Instruct-Ar}}
}

致謝

該模型基於 Qwen 團隊的 Qwen2.5-3B 模型構建，並採用了 Unsloth 的優化技術。我們感謝他們對語言建模領域的寶貴貢獻。

@misc{qwen2.5,
    title = {Qwen2.5: A Party of Foundation Models},
    url = {https://qwenlm.github.io/blog/qwen2.5/},
    author = {Qwen Team},
    month = {September},
    year = {2024}
}

@article{qwen2,
      title={Qwen2 Technical Report}, 
      author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
      journal={arXiv preprint arXiv:2407.10671},
      year={2024}
}