Traceback 12b
Model Overview
Model Features
Model Capabilities
Use Cases
đ TraceBack 12b Release
TraceBack is a solution conceived when pondering how to effectively scale the generation of reasoning trace data. It turns out that creating reasoning traces doesn't solely rely on reasoning models (such as r1, o1, o3, etc.).
This project has multiple objectives, primarily:
- Facilitating faster synthetic reasoning dataset generation. Since a smaller model (smaller than r1, etc.) is used, inference is quicker, making it easier to scale.
- Distilling on synthetic traces for out - of - domain non - verifiable problems.
- Converting any non - reasoning model outputs/datasets into a reasoning synthetic dataset when used as input.
So far, the current proof - of - concept has achieved the first and third goals. There is still significant room for improvement as:
- Only Mistral Nemo 12b is used as the base model.
- It was only trained for 2 epochs.
- Only 200k samples were used for finetuning (Qlora), and the dataset is available at https://huggingface.co/datasets/secemp9/instruction_solution_thought.
This model was trained using both instructions and solutions as input, and the output is a plausible reasoning trace based on them. The creator believes this is the future of reasoning data generation. Stay tuned for an evaluation release.
đ Quick Start
This section provides a high - level overview of the TraceBack project and its main goals. For more detailed usage, installation, and configuration information, refer to the subsequent sections.
⨠Features
- Efficient Dataset Generation: Enables faster synthetic reasoning dataset generation with a smaller model for quicker inference.
- Out - of - Domain Adaptability: Can distill on synthetic traces for non - verifiable problems outside the domain.
- Dataset Conversion: Converts non - reasoning model outputs/datasets into reasoning synthetic datasets.
đĻ Installation
The installation process involves setting up the necessary libraries and models. The project uses transformers
and unsloth
libraries. You can install them using the following commands:
pip install transformers
pip install unsloth
đģ Usage Examples
Basic Usage
Here is a basic example using the transformers
library:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load the tokenizer and model
model_name = "secemp9/TraceBack-12b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Move the model to the desired device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)
# Define the messages
messages = [
{"role": "user", "content": """Instruction:
how many r in strawberry
Solution:
There are **three** "r"s in "strawberry."
"""}
]
# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(device)
# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)
# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)
Advanced Usage
Using the unsloth
library for more advanced scenarios:
from unsloth import FastLanguageModel
# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained("secemp9/TraceBack-12b")
# Define the messages (replace "stuff_here" with your actual input)
messages = [
{"role": "user", "content": """Instruction:
how many r in strawberry
Solution:
There are **three** "r"s in "strawberry."
"""}
]
# Step 1: Apply chat template to get formatted text as a string
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Step 2: Tokenize the formatted text into a dictionary of tensors
inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device)
# Generate the response
outputs = model.generate(**inputs, max_new_tokens=32000)
# Decode and print the output
generated_text = tokenizer.decode(outputs[0])
print(generated_text)
đ Documentation
Inference Example
Here is a simple example using ChatGPT instruction + solution as input. Pass both the instruction and the solution to the model:
Dataset Example
The dataset follows the instruction + solution: reasoning trace pairs format. Here is a sample conversation:
{
"messages": [
{
"role": "user",
"content": "Instruction:
text_here
Solution:
text_here
},
{
"role": "assistant",
"content": "text_here"
}
]
}
It looks like this:
Prompt Format
The current prompt format is simple:
Instruction:
Solution:
The model output is just raw reasoning text without any specific formatting for now.
Axolotl config
The following is the Axolotl configuration used for training, along with DeepSpeed settings:
config.yml
# Base model configuration
base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
load_in_4bit: true
# Dataset configuration
datasets:
- path: instruction_solution_to_thought_dataset.jsonl
type: chat_template
# Chat template
chat_template: chatml
# LoRA adapter configuration
adapter: lora
lora_r: 16
lora_alpha: 16
lora_dropout: 0
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
# Training hyperparameters
max_seq_length: 128000
micro_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 3e-5
num_epochs: 3
warmup_steps: 100
optimizer: adamw_8bit
weight_decay: 0.01
lr_scheduler_type: cosine
max_grad_norm: 1.0
output_dir: ./outputs_solution_to_thought
seed: 3407
merge_lora: true
hf_upload: true
hf_repo: secemp9/TraceBack-12b
xformers_attention:
flash_attention: True
bf16: true # Enable BF16 mixed precision
# Multi-GPU training with DeepSpeed
deepspeed: deepspeed_configs/zero2.json
# Optional: Enable gradient checkpointing
gradient_checkpointing: true
deepspeed_configs/zero2.json
{
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true
},
"bf16": {
"enabled": true
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"weight_decay": "auto",
"betas": [0.9, 0.999],
"eps": 1e-8
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"steps_per_print": 10,
"wandb": {
"enabled": true
}
}
đ§ Technical Details
The model is based on unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
and uses techniques like LoRA for fine - tuning. It is trained on datasets such as instruction_solution_to_thought_dataset.jsonl
and secemp9/instruction_solution_thought
. The training process involves multiple steps including tokenization, applying chat templates, and using DeepSpeed for multi - GPU training.
đ License
This project is licensed under the Apache - 2.0 license.
đĻ Model Information
Property | Details |
---|---|
Model Type | outputs_solution_to_thought |
Training Data | instruction_solution_to_thought_dataset.jsonl, secemp9/instruction_solution_thought |

