TraceBack-12b Open Source Model - Free Deployment to Assist in Instruction Following and Chain of Thought Reasoning Tasks

Traceback 12b

Developed by secemp9

TraceBack 12b is a 4bit quantized version based on the Mistral-Nemo-Instruct architecture, focusing on instruction-following and chain-of-thought reasoning tasks.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Instruction Fine-tuning #4bit Quantization #Chain-of-Thought Reasoning

Downloads 1,470

Release Time : 3/5/2025

Model Overview

This model is an optimized instruction-following model suitable for generating solutions and chain-of-thought reasoning, specifically quantized to 4bit for efficient inference.

Model Features

4bit Quantization

The model is quantized to 4bit, significantly reducing memory requirements while maintaining good performance.

Instruction Optimization

Specially optimized for instruction-following tasks, capable of accurately understanding and executing complex instructions.

Chain-of-Thought Reasoning

Supports chain-of-thought reasoning, able to demonstrate the complete thought process of problem-solving.

Model Capabilities

Text generation

Instruction understanding

Question answering

Reasoning task processing

Use Cases

Education

Solution Process Generation

Helps students understand the problem-solving process for complex questions

Provides step-by-step solution approaches

Research Assistance

Research Idea Expansion

Helps researchers generate and evaluate research ideas

Provides multi-perspective analytical views

🚀 TraceBack 12b Release

TraceBack is a solution conceived when pondering how to effectively scale the generation of reasoning trace data. It turns out that creating reasoning traces doesn't solely rely on reasoning models (such as r1, o1, o3, etc.).

This project has multiple objectives, primarily:

Facilitating faster synthetic reasoning dataset generation. Since a smaller model (smaller than r1, etc.) is used, inference is quicker, making it easier to scale.
Distilling on synthetic traces for out - of - domain non - verifiable problems.
Converting any non - reasoning model outputs/datasets into a reasoning synthetic dataset when used as input.

So far, the current proof - of - concept has achieved the first and third goals. There is still significant room for improvement as:

Only Mistral Nemo 12b is used as the base model.
It was only trained for 2 epochs.
Only 200k samples were used for finetuning (Qlora), and the dataset is available at https://huggingface.co/datasets/secemp9/instruction_solution_thought.

This model was trained using both instructions and solutions as input, and the output is a plausible reasoning trace based on them. The creator believes this is the future of reasoning data generation. Stay tuned for an evaluation release.

image/png

🚀 Quick Start

This section provides a high - level overview of the TraceBack project and its main goals. For more detailed usage, installation, and configuration information, refer to the subsequent sections.

✨ Features

Efficient Dataset Generation: Enables faster synthetic reasoning dataset generation with a smaller model for quicker inference.
Out - of - Domain Adaptability: Can distill on synthetic traces for non - verifiable problems outside the domain.
Dataset Conversion: Converts non - reasoning model outputs/datasets into reasoning synthetic datasets.

📦 Installation

The installation process involves setting up the necessary libraries and models. The project uses transformers and unsloth libraries. You can install them using the following commands:

pip install transformers
pip install unsloth

💻 Usage Examples

Basic Usage

Here is a basic example using the transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the tokenizer and model
model_name = "secemp9/TraceBack-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move the model to the desired device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

# Define the messages
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

Advanced Usage

Using the unsloth library for more advanced scenarios:

from unsloth import FastLanguageModel

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained("secemp9/TraceBack-12b")

# Define the messages (replace "stuff_here" with your actual input)
messages = [
    {"role": "user", "content": """Instruction:
how many r in strawberry


Solution:
There are **three** "r"s in "strawberry."
"""}
]

# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device)

# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)

# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)

📚 Documentation

Inference Example

Here is a simple example using ChatGPT instruction + solution as input. Pass both the instruction and the solution to the model: image/png

Dataset Example

The dataset follows the instruction + solution: reasoning trace pairs format. Here is a sample conversation:

{
  "messages": [
    {
      "role": "user",
      "content": "Instruction:
      text_here

      Solution:
      text_here
    },
    {
      "role": "assistant",
      "content": "text_here"
    }
  ]
}

It looks like this: image/png

Prompt Format

The current prompt format is simple:

Instruction:
Solution:

The model output is just raw reasoning text without any specific formatting for now.

Axolotl config

The following is the Axolotl configuration used for training, along with DeepSpeed settings:

config.yml

# Base model configuration
base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
load_in_4bit: true

# Dataset configuration
datasets:
  - path: instruction_solution_to_thought_dataset.jsonl
    type: chat_template

# Chat template
chat_template: chatml

# LoRA adapter configuration
adapter: lora
lora_r: 16
lora_alpha: 16
lora_dropout: 0
lora_target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

# Training hyperparameters
max_seq_length: 128000
micro_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 3e-5
num_epochs: 3
warmup_steps: 100
optimizer: adamw_8bit
weight_decay: 0.01
lr_scheduler_type: cosine
max_grad_norm: 1.0
output_dir: ./outputs_solution_to_thought
seed: 3407
merge_lora: true
hf_upload: true
hf_repo: secemp9/TraceBack-12b
xformers_attention:
flash_attention: True
bf16: true          # Enable BF16 mixed precision
# Multi-GPU training with DeepSpeed
deepspeed: deepspeed_configs/zero2.json

# Optional: Enable gradient checkpointing
gradient_checkpointing: true

deepspeed_configs/zero2.json

{
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
  },
  "bf16": {
    "enabled": true
  },
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": "auto",
      "weight_decay": "auto",
      "betas": [0.9, 0.999],
      "eps": 1e-8
    }
  },
  "scheduler": {
    "type": "WarmupLR",
    "params": {
      "warmup_min_lr": 0,
      "warmup_max_lr": "auto",
      "warmup_num_steps": "auto"
    }
  },
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "steps_per_print": 10,
  "wandb": {
    "enabled": true
  }
}

🔧 Technical Details

The model is based on unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit and uses techniques like LoRA for fine - tuning. It is trained on datasets such as instruction_solution_to_thought_dataset.jsonl and secemp9/instruction_solution_thought. The training process involves multiple steps including tokenization, applying chat templates, and using DeepSpeed for multi - GPU training.

📄 License

This project is licensed under the Apache - 2.0 license.

📦 Model Information

Property	Details
Model Type	outputs_solution_to_thought
Training Data	instruction_solution_to_thought_dataset.jsonl, secemp9/instruction_solution_thought

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご