TraceBack-12b开源模型 - 免费部署助力指令遵循与思维链推理任务

首页

Traceback 12b

由 secemp9 开发

TraceBack 12b 是基于 Mistral-Nemo-Instruct 架构的 4bit 量化版本，专注于指令遵循和思维链推理任务。

大型语言模型

Transformers

开源协议:Apache-2.0 #指令微调 #4bit量化 #思维链推理

下载量 1,470

发布时间 : 3/5/2025

模型简介

该模型是一个经过优化的指令跟随模型，适用于生成解决方案和思维链推理，特别针对高效推理进行了4bit量化处理。

模型特点

4bit量化

模型经过4bit量化处理，显著降低内存需求同时保持良好性能

指令优化

专门针对指令遵循任务进行优化，能够准确理解并执行复杂指令

思维链推理

支持思维链推理，能够展示问题解决的完整思考过程

模型能力

文本生成

指令理解

问题解答

推理任务处理

使用案例

教育

解题思路生成

帮助学生理解复杂问题的解决过程

提供分步解题思路

研究辅助

研究思路拓展

帮助研究人员生成和评估研究思路

提供多角度分析视角

🚀 TraceBack 12b 版本

TraceBack 是我在思考“如何有效地扩展推理跟踪数据生成”时提出的方案。事实证明，创建推理跟踪并不需要仅仅依赖推理模型（如 r1、o1、o3 等）！

它有多个目标，但主要包括：

实现更快的合成推理数据集生成，因为这里使用的是小型模型（比 r1 等更小），推理速度更快，因此更易于扩展。
对合成跟踪进行提炼，以解决领域外不可验证的问题。
当用作输入时，将任何非推理模型输出/数据集转换为推理合成数据集。

到目前为止，当前的概念验证已经实现了目标 1 和 3，我计划进一步扩展：

目前仅使用 Mistral Nemo 12b 作为基础模型。
仅训练了 2 个周期。
微调（Qlora）仅使用了 200k 个样本，数据集位于 secemp9/instruction_solution_thought。

因此，仍有很大的改进空间。

该模型使用指令和解决方案作为输入进行训练，输出是基于此的合理推理跟踪。

我相信这是推理数据生成的未来。请关注评估版本的发布。

image/png

🚀 快速开始

TraceBack 旨在有效扩展推理跟踪数据生成。它不依赖传统推理模型，而是使用小型模型实现更快的推理和数据集生成。以下是关于该模型的详细信息。

模型信息

属性	详情
模型类型	outputs_solution_to_thought
训练数据	instruction_solution_to_thought_dataset.jsonl、secemp9/instruction_solution_thought
基础模型	unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
许可证	apache-2.0

✨ 主要特性

快速合成推理数据集生成：使用小型模型，推理速度快，易于扩展。
跨领域问题处理：可对合成跟踪进行提炼，解决领域外不可验证的问题。
数据转换：将非推理模型输出/数据集转换为推理合成数据集。

📦 安装指南

文档未提供具体安装步骤，可参考代码示例中的依赖库安装。

💻 使用示例

基础用法

# Using transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the tokenizer and model
model_name = "secemp9/TraceBack-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move the model to the desired device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

# Define the messages
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

高级用法

# unsloth
from unsloth import FastLanguageModel

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained("secemp9/TraceBack-12b")

# Define the messages (replace "stuff_here" with your actual input)
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

📚 详细文档

推理示例

使用 ChatGPT 指令 + 解决方案作为输入，将指令和解决方案同时传递给模型。 image/png

数据集示例

数据集格式遵循指令 + 解决方案：推理跟踪对。

{
  "messages": [
    {
      "role": "user",
      "content": "Instruction:
      text_here

      Solution:
      text_here
    },
    {
      "role": "assistant",
      "content": "text_here"
    }
  ]
}

image/png

提示格式

目前的提示格式为：

Instruction:
Solution:

模型输出目前没有任何格式，仅为推理结果。

Axolotl 配置

将 unsloth 代码转换为 Axolotl 配置文件，并使用 DeepSpeed 进行多 GPU 训练。

config.yml

# Base model configuration
base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
load_in_4bit: true

# Dataset configuration
datasets:
  - path: instruction_solution_to_thought_dataset.jsonl
    type: chat_template

# Chat template
chat_template: chatml

# LoRA adapter configuration
adapter: lora
lora_r: 16
lora_alpha: 16
lora_dropout: 0
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

# Training hyperparameters
max_seq_length: 128000
micro_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 3e-5
num_epochs: 3
warmup_steps: 100
optimizer: adamw_8bit
weight_decay: 0.01
lr_scheduler_type: cosine
max_grad_norm: 1.0
output_dir: ./outputs_solution_to_thought
seed: 3407
merge_lora: true
hf_upload: true
hf_repo: secemp9/TraceBack-12b
xformers_attention:
flash_attention: True
bf16: true          # Enable BF16 mixed precision
# Multi-GPU training with DeepSpeed
deepspeed: deepspeed_configs/zero2.json

# Optional: Enable gradient checkpointing
gradient_checkpointing: true

deepspeed_configs/zero2.json

{
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
  },
  "bf16": {
    "enabled": true
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "weight_decay": "auto",
      "betas": [0.9, 0.999],
      "eps": 1e-8
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params": {
      "warmup_min_lr": 0,
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto"
    }
  },
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "steps_per_print": 10,
  "wandb": {
    "enabled": true
  }
}