Superthoughts-lite-v2-MOE-Llama3.2-GGUF Open Source Model - An Accurate and Efficient Choice for Inference Tasks

Superthoughts Lite V2 MOE Llama3.2 GGUF

Developed by Pinkstack

Superthoughts Lite v2 is a lightweight Mixture of Experts (MOE) model based on the Llama-3.2 architecture, focusing on reasoning tasks to provide higher accuracy and performance.

Large Language Model Supports Multiple Languages#Mixture of Experts Reasoning #Structured Thought Generation #Multi-domain Precision Response

Downloads 119

Release Time : 5/6/2025

Model Overview

This model is a lightweight reasoning model suitable for chat, mathematics, coding, and scientific reasoning tasks. It achieves efficient reasoning through a Mixture of Experts (MOE) architecture, reducing looping phenomena during response generation.

Model Features

Mixture of Experts Architecture

Includes 4 expert models (chat, mathematics, coding, scientific reasoning), activating 2 experts per inference to improve task-specific performance

Efficient Reasoning

Optimized with GRPO technology and Unsloth fine-tuning for better performance and fewer looping phenomena

Structured Thought Output

Supports generating step-by-step reasoning processes within <think> tags, enhancing transparency and interpretability

Long Context Support

Supports context lengths of up to 131072 tokens, suitable for handling complex tasks

Model Capabilities

Text Generation

Mathematical Reasoning

Code Generation

Scientific Reasoning

Dialogue Systems

Use Cases

Education

Math Problem Solving

Helps students solve complex math problems and displays step-by-step reasoning processes

Improves learning efficiency and depth of understanding

Programming Learning Assistance

Explains programming concepts and generates example code

Helps beginners grasp programming skills faster

Research

Scientific Concept Explanation

Explains complex scientific concepts and theories

Assists researchers in quickly understanding cross-domain knowledge

🚀 Superthoughts Lite v2 MOE Llama3.2

A powerful and lite reasoning model for various tasks including chemistry, code, math, and conversations.

🚀 Quick Start

This is the GGUF version of Superthoughts Lite v2 MOE Llama3.2, with 3.91B parameters, 2 experts active, and 4 in total.

You can access different versions of the model:

superthoughts lite v2 moe logo

✨ Features

Better Performance: This non - experimental version offers better accuracy at all tasks, better performance, and less looping while generating responses.
Powerful Reasoning: It is a powerful, lite reasoning model trained with multiple experts, including chat, math, code, and science reasoning experts.
Replacement for v1: It is a direct replacement of Pinkstack/Superthoughts - lite - v1, with much better code generation and text performance.

📚 Documentation

Model Training

We trained the model by first creating a base model for all the experts, which was fine - tuned using GRPO techniques with Unsloth on top of meta - llama/Llama - 3.2 - 1B - Instruct. After that, we trained each potential expert using SFT and then did GRPO again. There are 4 experts in total:

Chat reasoning expert
Math reasoning expert
Code reasoning expert
Science reasoning expert

System Prompt

You should use the following system prompt:

Thinking: enabled.

Follow this format strictly:
<think>
Write your step - by - step reasoning here.
Break down the problem into smaller parts.
Solve each part systematically.
Check your work and verify the answer makes sense.
</think>
[Your final answer after thinking].

Model Information

The model can generate up to 16,380 tokens and has a context size of 131072. It has been fine - tuned to generate thinking data in - between <think> xml tags. Note that it may still have some slight looping, but they are rare.

Limitations

Safety Alignment: While some safety alignment was done, it was very minimal. Thus, the model can be uncensored at times.
Hallucination: All large language models (LLMs), including this one, can hallucinate and output false information. Always double - check responses.
Knowledge Dependency: The chat model may make things up unless you provide it with proper information.

License

By using this model, you agree to the [LLAMA 3.2 COMMUNITY LICENSE](https://huggingface.co/meta - llama/Llama - 3.2 - 1B/blob/main/LICENSE.txt).

GGUF Template

{{ if .Messages }}
{{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|>
{{- if .System }}

{{ .System }}
{{- end }}
{{- if .Tools }}

You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the original use question.
{{- end }}
{{- end }}<|eot_id|>
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{ $.Tools }}
{{- end }}

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}

{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}

{{ .Content }}{{ if not $last }}<|eot_id|>{{ end }}
{{- end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}
{{- end }}
{{- else }}
{{- if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}{{ .Response }}{{ if .Response }}<|eot_id|>{{ end }}

📄 License

The model is under the [LLAMA 3.2 COMMUNITY LICENSE](https://huggingface.co/meta - llama/Llama - 3.2 - 1B/blob/main/LICENSE.txt).

📦 Information Table

Property	Details
Model Type	Superthoughts Lite v2 MOE Llama3.2
Training Data	First, a base model was fine - tuned using GRPO techniques with Unsloth on top of meta - llama/Llama - 3.2 - 1B - Instruct. Then, each expert was trained using SFT and GRPO again.
Token Limit	16,380
Context Size	131072

⚠️ Important Note

The model has minimal safety alignment, so it can be uncensored at times.
All large language models can hallucinate and output false information. Always double - check responses.
The chat model may make things up if not provided with proper information.

💡 Usage Tip Make sure to use the provided system prompt for better results.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご