Falcon3-Mamba-R1-v0 Open Source Inference Optimization Model - Achieving Efficient Inference Based on Mamba Architecture

Falcon3 Mamba R1 V0

Developed by hanzla

An inference-optimized model fine-tuned based on Falcon3-Mamba-7B-Instruct, utilizing the Mamba architecture for efficient inference

Large Language Model

Transformers

#Mamba architecture inference #STEM Q&A optimization #Linear computation efficiency

Downloads 1,690

Release Time : 3/8/2025

Model Overview

A language model optimized for heavy reasoning tasks, excelling in mathematics, logic, and structured problem-solving, achieving linear computational growth through the Mamba architecture

Model Features

Efficient Inference Architecture

The Mamba architecture enables linear computational growth with token count, maintaining high-quality responses while achieving fast inference

Logical Reasoning Optimization

Specialized fine-tuning for STEM domain problems, significantly improving mathematical and logical problem-solving capabilities

Long-Text Generation

Supports context processing of up to 64K tokens (performance may degrade with extremely long texts)

Model Capabilities

Mathematical problem-solving

Logical reasoning

Structured problem-solving

STEM domain Q&A

General text generation

Use Cases

Education & Research

Math Problem Assistant

Helps students solve complex mathematical problems step-by-step

89.5% accuracy on GSM8K test set

Research Problem Analysis

Assists researchers in logical reasoning and hypothesis validation

Technical Development

Programming Logic Assistance

Generates algorithm pseudocode or solves programming challenges

🚀 Falcon3-Mamba-R1-v0

A fine - tuned model based on Falcon3 - Mamba - 7B - Instruct, optimized for logical reasoning and structured problem - solving.

Model Image

📚 Documentation

✨ Features

This model is a fine - tuned version of Falcon3 - Mamba - 7B - Instruct, optimized for logical reasoning and structured problem - solving before generating responses.
It uses the Mamba architecture, which scales linearly with an increased number of tokens, ensuring efficient and fast reasoning while maintaining high - quality responses.
The fine - tuned version is from an earlier checkpoint of the fine - tuning pipeline.

📋 Model Details

Developed by: Hanzla Javaid
Base Model: tiiuae/Falcon3 - Mamba - 7B - Instruct
Model Type: Mamba - based causal decoder
Model Release Date: March 2025

Property	Details
Model Type	Mamba - based causal decoder
Base Model	tiiuae/Falcon3 - Mamba - 7B - Instruct
Developed by	Hanzla Javaid
Model Release Date	March 2025

🎯 Intended Uses

Direct Use

Reasoning - heavy tasks (math, logic, and structured problem - solving)
STEM - based question - answering
General - purpose text generation

Downstream Use

Fine - tuning for domain - specific applications such as finance, law, medicine, and research.
Integration into chatbots and virtual assistants that require advanced reasoning skills.
Enhancement of automated coding assistants with structured logic building.

Out - of - Scope Use

Misinformation or deceptive applications
Automated decision - making in high - risk fields (e.g., medical diagnosis without human oversight)
Bias - sensitive applications where fairness is critical but not explicitly controlled

⚠️ Bias and Limitations

Known Biases

The model prioritizes English language data, so performance on multilingual tasks may be weaker.
Fine - tuning may introduce or amplify biases present in the training data, especially in areas like ethics, politics, and cultural perspectives.

Technical Limitations

Performance may degrade on long - form generation beyond 64K tokens.

⚠️ Important Note

Users should verify outputs for accuracy, especially in critical applications. Regular bias evaluation should be conducted when deploying in production environments.

🚀 Quick Start

To use this model, you can load it with transformers:

repo_name = "hanzla/Falcon3-Mamba-R1-v0"
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained(repo_name)

model = AutoModelForCausalLM.from_pretrained(
    repo_name,
    device_map="auto",
    torch_dtype=torch.float16,
)

def generate_text(prompt,generation_model,generation_tokenizer,max_tokens=1024):
    messages = [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": prompt},
    ]
    input_text = generation_tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    print(input_text)
    input_ids = generation_tokenizer(input_text, return_tensors="pt").input_ids.to("auto")
    outputs = generation_model.generate(input_ids, max_new_tokens=max_tokens)
    generated_tokens = outputs[0][len(input_ids[0]):] 
    return tokenizer.decode(generated_tokens, skip_special_tokens=True)

🔧 Technical Details

Training Procedure

Pretrained Base Model: Falcon3 - Mamba - 7B - Instruct
Fine - tuning Data: A subset of STEM problems from open - thoughts/OpenThoughts - 114k
Training Strategy: GRPO
Training Hyperparameters:
- Batch Size: 4
- Epochs: 3
- Precision: Mixed (fp16 / bf16)
- Hardware: 2xH100 GPUs

Evaluation

The fine - tuned model's performance was evaluated on a variety of benchmarks to assess its reasoning abilities and overall capabilities. The table below presents a comparison between the fine - tuned model and the base model:

Category	Benchmark	Falcon3 - Mamba - R1 - v0	Base Falcon3 - Mamba - 7B - Instruct
General	MMLU (5 - shot)	72.1	65.3
Math	GSM8K (5 - shot)	89.5	65.2

Model Architecture

Mamba Blocks: 64
Hidden Size: 4096

Software Requirements

transformers >= 4.38
torch >= 2.1
accelerate >= 0.25
mamba-ssm
causal-conv1d>=1.4.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご