TraceBack-12b開源模型 - 免費部署助力指令遵循與思維鏈推理任務

首頁

Traceback 12b

由secemp9開發

TraceBack 12b 是基於 Mistral-Nemo-Instruct 架構的 4bit 量化版本，專注於指令遵循和思維鏈推理任務。

大型語言模型

Transformers

開源協議:Apache-2.0 #指令微調 #4bit量化 #思維鏈推理

下載量 1,470

發布時間 : 3/5/2025

模型概述

該模型是一個經過優化的指令跟隨模型，適用於生成解決方案和思維鏈推理，特別針對高效推理進行了4bit量化處理。

模型特點

4bit量化

模型經過4bit量化處理，顯著降低內存需求同時保持良好性能

指令優化

專門針對指令遵循任務進行優化，能夠準確理解並執行復雜指令

思維鏈推理

支持思維鏈推理，能夠展示問題解決的完整思考過程

模型能力

文本生成

指令理解

問題解答

推理任務處理

使用案例

教育

解題思路生成

幫助學生理解複雜問題的解決過程

提供分步解題思路

研究輔助

研究思路拓展

幫助研究人員生成和評估研究思路

提供多角度分析視角

🚀 TraceBack 12b 版本

TraceBack 是我在思考“如何有效地擴展推理跟蹤數據生成”時提出的方案。事實證明，創建推理跟蹤並不需要僅僅依賴推理模型（如 r1、o1、o3 等）！

它有多個目標，但主要包括：

實現更快的合成推理數據集生成，因為這裡使用的是小型模型（比 r1 等更小），推理速度更快，因此更易於擴展。
對合成跟蹤進行提煉，以解決領域外不可驗證的問題。
當用作輸入時，將任何非推理模型輸出/數據集轉換為推理合成數據集。

到目前為止，當前的概念驗證已經實現了目標 1 和 3，我計劃進一步擴展：

目前僅使用 Mistral Nemo 12b 作為基礎模型。
僅訓練了 2 個週期。
微調（Qlora）僅使用了 200k 個樣本，數據集位於 secemp9/instruction_solution_thought。

因此，仍有很大的改進空間。

該模型使用指令和解決方案作為輸入進行訓練，輸出是基於此的合理推理跟蹤。

我相信這是推理數據生成的未來。請關注評估版本的發佈。

image/png

🚀 快速開始

TraceBack 旨在有效擴展推理跟蹤數據生成。它不依賴傳統推理模型，而是使用小型模型實現更快的推理和數據集生成。以下是關於該模型的詳細信息。

模型信息

屬性	詳情
模型類型	outputs_solution_to_thought
訓練數據	instruction_solution_to_thought_dataset.jsonl、secemp9/instruction_solution_thought
基礎模型	unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
許可證	apache-2.0

✨ 主要特性

快速合成推理數據集生成：使用小型模型，推理速度快，易於擴展。
跨領域問題處理：可對合成跟蹤進行提煉，解決領域外不可驗證的問題。
數據轉換：將非推理模型輸出/數據集轉換為推理合成數據集。

📦 安裝指南

文檔未提供具體安裝步驟，可參考代碼示例中的依賴庫安裝。

💻 使用示例

基礎用法

# Using transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the tokenizer and model
model_name = "secemp9/TraceBack-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move the model to the desired device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

# Define the messages
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

高級用法

# unsloth
from unsloth import FastLanguageModel

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained("secemp9/TraceBack-12b")

# Define the messages (replace "stuff_here" with your actual input)
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

📚 詳細文檔

推理示例

使用 ChatGPT 指令 + 解決方案作為輸入，將指令和解決方案同時傳遞給模型。 image/png

數據集示例

數據集格式遵循指令 + 解決方案：推理跟蹤對。

{
  "messages": [
    {
      "role": "user",
      "content": "Instruction:
      text_here

      Solution:
      text_here
    },
    {
      "role": "assistant",
      "content": "text_here"
    }
  ]
}

image/png

提示格式

目前的提示格式為：

Instruction:
Solution:

模型輸出目前沒有任何格式，僅為推理結果。

Axolotl 配置

將 unsloth 代碼轉換為 Axolotl 配置文件，並使用 DeepSpeed 進行多 GPU 訓練。

config.yml

# Base model configuration
base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
load_in_4bit: true

# Dataset configuration
datasets:
  - path: instruction_solution_to_thought_dataset.jsonl
    type: chat_template

# Chat template
chat_template: chatml

# LoRA adapter configuration
adapter: lora
lora_r: 16
lora_alpha: 16
lora_dropout: 0
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

# Training hyperparameters
max_seq_length: 128000
micro_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 3e-5
num_epochs: 3
warmup_steps: 100
optimizer: adamw_8bit
weight_decay: 0.01
lr_scheduler_type: cosine
max_grad_norm: 1.0
output_dir: ./outputs_solution_to_thought
seed: 3407
merge_lora: true
hf_upload: true
hf_repo: secemp9/TraceBack-12b
xformers_attention:
flash_attention: True
bf16: true          # Enable BF16 mixed precision
# Multi-GPU training with DeepSpeed
deepspeed: deepspeed_configs/zero2.json

# Optional: Enable gradient checkpointing
gradient_checkpointing: true

deepspeed_configs/zero2.json

{
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
  },
  "bf16": {
    "enabled": true
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "weight_decay": "auto",
      "betas": [0.9, 0.999],
      "eps": 1e-8
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params": {
      "warmup_min_lr": 0,
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto"
    }
  },
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "steps_per_print": 10,
  "wandb": {
    "enabled": true
  }
}