Motif-2.6B Open-Source Language Model - Building a Reliable and Useful AI Aligned with Human Values

Motif 2.6B

Developed by Motif-Technologies

Motif 2.6B is a language model with 2.6 billion parameters, trained from scratch on AMD Instinct™ MI250 GPUs, aiming to build AI that aligns with human values, is useful, and reliable.

Large Language Model

Safetensors

Supports Multiple LanguagesOpen Source License:Other #AMD GPU Optimization #Small-Scale Large Language Model #Multi-Task Benchmark Performance

Downloads 1,470

Release Time : 6/6/2025

Model Overview

Motif 2.6B is a small large language model (sLLM) aiming to achieve performance on par with well - known open - source models such as Gemma, Llama, and Phi.

Model Features

Trained on AMD GPUs

Trained from scratch on AMD Instinct™ MI250 GPUs, demonstrating the ability to train on non - NVIDIA hardware.

Small Large Language Model

Focuses on the small large language model (sLLM) domain, aiming to compete with well - known open - source models.

Aligned with Human Values

One of the design goals is to build AI that aligns with human values, is useful, and reliable.

Model Capabilities

Text Generation

Question Answering System

Dialogue System

Code Generation

Use Cases

Intelligent Assistant

Question Answering System

Answer various questions raised by users, such as fact queries and knowledge solutions.

Can provide accurate and detailed answers, for example, knowledge about geography and history.

Code Generation

Programming Assistance

Generate code snippets or solve programming problems according to user requirements.

Performs excellently in the HumanEval benchmark test, with an improvement rate of up to 123.93%.

🚀 Motif 2.6B

Motif 2.6B is a 2.6 billion parameter language model trained from scratch. It aims to match the performance of well - known open - source models and is our first step towards building AI aligned with human values.

🚀 Quick Start

We are excited to announce Motif 2.6B, a language model with 2.6 billion parameters. It has been trained from scratch on AMD Instinct™ MI250 GPUs. This release represents our initial foray into creating a helpful and reliable AI that aligns with human values. Our goal with Motif 2.6B is to achieve performance comparable to well - known open - source models such as Gemma, Llama, and Phi, especially those in the sLLM regime.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

✨ Features

Training information

GPUs: 384 MI250
Training time: 42 days
Training data: 2.4T tokens

Notice: A detailed technical report will be released at a later time.

Evaluation

When models are released, their developers often present benchmark results based on their chosen evaluation settings in technical reports or papers. This common practice can lead to difficulties in comparing models across different organizations, as evaluation scores can vary depending on conditions that are not always fully disclosed. To address this, we reference performance scores from each model’s official publications.

In the Evaluation Appendix, we provide specific examples of benchmark score differences for major models to illustrate the variability of evaluation scores across reports.

Comparison to Mistral 7B by Mistral AI

The benchmarks and scores in the table below are from the Mistral 7B technical report.

Benchmark	Metric	Mistral 7B	Motif 2.6B	Improvement
MMLU	5 - shot	60.1	57.93	-3.61%
HellaSwag	0 - shot	81.3	61.35	-24.54%
WinoG	0 - shot	75.3	59.91	-20.44%
PIQA	0 - shot	83	75.95	-8.49%
Arc - e	0 - shot	80	87.21	+9.01%
Arc - c	0 - shot	55.5	74.2	+33.69%
NQ	5 - shot	28.8	11.14	-61.32%
TriviaQA	5 - shot	69.9	54.97	-21.36%
HumanEval	0 - shot	30.5	68.3	+123.93%
MBPP	3 - shot	47.5	60.3	+26.95%
MATH	4 - shot, maj@4	13.1	40.2*	+206.87%
GSM8K	8 - shot, maj@8	52.2	80.21	+53.66%
			Average	+34.25%

* : We report the 4 - shot, maj@1 score instead of the 4 - shot, maj@4.

Comparison to the Gemma series by Google

Gemma 1 & 2

The benchmarks and scores in the table below are from the Gemma 2 technical report.

Note: Although referred to as "2B", Gemma 2 2B actually has 2.6 billion parameters.

Benchmark	Metric	Gemma 1 2B	Gemma 1 7B	Gemma 2 2B	Gemma 2 9B	Motif 2.6B	Improvement(over 1 1B)	Improvement(over 1 7B)	Improvement(over 2 2B)	Improvement(over 2 9B)
MMLU	5 - shot	42.3	64.4	52.2	71.3	57.93	+36.95%	-10.05%	+10.98%	-18.75%
ARC - C	25 - shot	48.5	61.1	55.7	68.4	75.08	+54.80%	+22.88%	+34.79%	+9.77%
GSM8K	5 - shot	15.1	51.8	24.3	68.6	75.13	+397.55%	+45.04%	+309.18%	+9.52%
AGIEval	3 - 5 - shot	24.2	44.9	31.5	52.8	-	-	-	-	-
DROP	3 - shot, F1	48.5	56.3	51.2	69.4	29.33	-39.53%	-47.90%	-42.71%	-57.74%
BBH	3 - shot, CoT	35.2	59	41.9	68.2	48.56	37.95%	-17.69%	+15.89%	-28.80%
Winogrande	5 - shot	66.8	79	71.3	80.6	67.09	+0.43%	-15.08%	-5.90%	-16.76%
HellaSwag	10 - shot	71.7	82.3	72.9	81.9	69.89	-2.52%	-15.08%	-4.13%	-14.66%
MATH	4 - shot	11.8	24.3	16	36.6	40.2	+240.88%	+65.43%	+151.25%	+9.84%
ARC - e	0 - shot	73.2	81.5	80.6	88	87.21	+19.14%	+7.01%	+8.20%	-0.90%
PIQA	0 - shot	77.3	81.2	78.4	81.7	75.95	-1.75%	-6.47%	-3.13%	-7.04%
SIQA	0 - shot	49.7	51.8	51.9	53.4	61.97	+24.69%	+19.63%	+19.40%	+16.05%
Boolq	0 - shot	69.4	83.2	72.7	84.2	67.76	-2.36%	-18.56%	-6.80%	-19.52%
TriviaQA	5 - shot	53.2	63.4	60.4	76.6	54.97	+3.33%	-13.30%	-8.99%	-28.24%
NQ	5 - shot	12.5	23	17.1	29.2	10.91	-12.72%	-52.57%	-36.20%	-62.64%
HumanEval	pass@1	22	32.3	20.1	40.2	68.3	+210.45%	+111.46%	+239.80%	+69.90%
MBPP	3 - shot	29.2	44.4	30.2	52.4	60.3	+106.51%	+35.81%	+99.67%	+15.08%
						Average	+90.79%	+3.44%	+46.17%	-13.45%

Gemma 3

The benchmarks and scores in the table below are from the Gemma 3 technical report.

Benchmark	Metric	Gemma 3 1B	Gemma 3 4B	Motif 2.6B	Improvement(over 1B)	Improvement(over 4B)
HellaS	10 - shot	62.3	77.2	69.89	+12.18%	-9.47%
BoolQ	0 - shot	63.2	72.3	67.76	+7.22%	-6.28%
PIQA	0 - shot	73.8	79.6	75.59	+2.43%	-5.04%
SIQA	0 - shot	48.9	51.9	61.97	+26.73%	+19.40%
TQA	5 - shot	39.8	65.8	54.97	+38.12%	-16.46%
NQ	5 - shot	9.48	20	10.91	+15.08%	-45.45%
ARC - C	25 - shot	38.4	56.2	75.08	+95.52%	+33.59%
ARC - E	0 - shot	73	82.4	87.21	+19.47%	+5.84%
WinoG	5 - shot	58.2	64.7	67.09	+15.27%	+3.69%
BBH	few - shot, CoT	28.4	50.9	48.56	+70.99%	-4.60%
Drop	1 - shot, F1	42.4	60.1	29.33	-30.83%	-51.20%
MMLU	5 - shot	-	59.6	57.93	-	-2.80%
MMLUpro	5 - shot, CoT	-	29.2	-	-	-
AGIE	3 - 5 - shot	-	42.1	-	-	-
MATH	4 - shot, CoT	-	24.2	40.2	-	+66.12%
GSM8K	8 - shot, CoT	-	38.4	80.21	-	+108.88%
GPQA Diamond	5 - shot, CoT	-	15	31.81	-	+112.07%
MBPP	3 - shot	-	46	60.3	-	+31.09%
HumanE	0 - shot	-	36	68.3	-	+89.72%
					Average	+22.04%

Comparison to the Llama series by Meta

Llama 3

The benchmarks and scores in the table below are from the Llama 3 technical report.

Benchmark	Metric	Llama 3 8B	Motif 2.6B	Improvement
MMLU	5 - shot	69.4	57.93	-16.53%
MMLU	0 - shot, CoT	73	57.95	-20.62%
MMLU - Pro	5 - shot, CoT	48.3	-	-
IFEval	-	80.4	74.02	-7.94%
HumanEval	0 - shot	72.6	68.3	-5.92%
MBPP	0 - shot	72.8	57.93	-20.43%
GSM8K	8 - shot, CoT	84.5	80.21	-5.08%
MATH	0 - shot, CoT	51.9	49.68	-4.28%
ARC Challenge	0 - shot	83.4	74.2	-11.03%
GPQA	0 - shot, CoT	32.8	18.53	-43.51%
			Average	-15.04%

Llama 3.2

The benchmarks and scores in the table below are from the Llama 3.2 official blog.

Benchmark	Metric	Llama 3.2 1B	Llama 3.2 3B	Motif 2.6B	Improvement(over 1B)	Improvement(over 3B)
MMLU	0 - shot	49.3	63.4	57.6	+16.75%	-9.21%
Open - rewrite eval*	0 - shot, rougeL	41.6	40.1	-	-	-
TLDR9+	test, 1 - shot, rougeL	16.8	19	-	-	-
IFEval	-	59.5	77.4	74.02	+24.40%	-4.37%
GSM8K	8 - shot, CoT	44.4	77.7	80.21	+80.65%	+3.23%
MATH	0 - shot, CoT	30.6	48	49.68	+62.35%	+3.50%
ARC Challenge	0 - shot	59.4	78.6	74.2	+24.92%	-5.6%
GPQA	0 - shot	27.2	32.8	25.45	-6.43%	-22.41%
Hellaswag	0 - shot	41.2	69.8	61.35	+48.91%	-12.11%
					Average	+41.82%

Comparison to the Phi series by Microsoft

The benchmarks and scores in the table below are from the Phi - 3 technical report.

Benchmark	Metric	Phi - 3 3.8B	Phi - 3 7B	Phi - 2 2.7B	Motif 2.6B	Improvement(over 3.8B)	Improvement(over 7B)	Improvement(over 2.7B)
MMLU	5 - shot	68.8	75.7	56.3	57.93	-15.80%	-23.47%	+2.90%
HellaSwag	5 - shot	76.7	77	53.6	68.97	-10.08%	-10.43%	+28.68%
ANLI	7 - shot	52.8	58.1	42.5	47.99	-9.11%	-17.40%	+12.92%
GSM - 8K	8 - shot, CoT	82.5	89.6	61.1	80.21	-2.78%	-10.48%	+31.28%
MATH	0 - shot, CoT	41.3	34.6	-	49.68	+20.29%	+43.58%	-
MedQA	2 - shot	53.8	65.4	40.9	42.1	-21.75%	-35.63%	+2.93%
AGIEval	0 - shot	37.5	45.1	29.8	-	-	-	-
TriviaQA	5 - shot	64	58.1	45.2	54.97	-14.11%	-5.39%	+21.62%
Arc - C	10 - shot	84.9	90.7	75.9	75.17	-11.46%	-17.12%	-0.96%
Arc - E	10 - shot	94.6	97	88.5	88.64	-6.30%	-8.62%	+0.16%
PIQA	5 - shot	84.2	86.9	60.2	78.29	-7.02%	-9.91%	+30.05%
SociQA	5 - shot	76.6	79.2	68.3	66.73	-12.89%	-15.74%	-2.3%
BigBench - Hard	3 - shot, CoT	71.7	79.1	59.4	48.56	-32.27%	-38.61%	-18.25%
WinoGrande	5 - shot	70.8	81.5	54.7	67.09	-5.24%	-17.68%	+22.65%
OpenBookQA	10 - shot	83.2	88	73.6	87.8	+5.53%	-0.23%	+19.29%
BoolQ	2 - shot	77.2	84.8	-	70.7	-8.42%	-16.63%	-
CommonSenseQA	10 - shot	80.2	80	69.3	71.25	-11.16%	-10.94%	2.81%
TruthfulQA	10 - shot	65	70.2	-	52.07	-19.89%	-25.83%	-
HumanEval	0 - shot	58.5	61	59	68.29	+16.74%	+11.95%	+15.75%
MBPP	3 - shot	70	71.7	60.6	60.3	-13.86%	-15.90%	-0.50%
GPQA	2 - shot, CoT	32.8	34.3	-	23.44	-28.54%	-31.66%	-
MT Bench	2R. Avg.	8.38	8.7	-	6.77	-19.21%	-22.18%	-
						Average	-9.87%	-13.25%

📚 Evaluation Appendix

In the above comparisons, based on the benchmark scores in the original technical reports, Motif 2.6B shows average performance improvements of -15.36% and -13.45% over Llama 3 8B and Gemma 2 9B respectively. However, when compared to the benchmarks and scores in the Qwen 2.5 technical report, Motif 2.6B shows an average improvement of +19.27% over Llama 3 8B and +1.68% over Gemma 2 9B. See the table below for details.

Comparison to Llama 3 8B and Gemma 2 9B based on scores from the Qwen2.5 technical report

The benchmarks and scores in the table below are from the Qwen2.5 technical report.

Benchmark	Metric	Llama 3 8B	Gemma 2 9B	Motif 2.6B	Improvement(over Llama 3 8B)	Improvement(over Gemma 2 9B)
MMLU	5 - shot	66.6	71.3	57.93	-13.02%	-18.75%
MMLU - pro	5 - shot	35.4	44.7	28.4	-19.77%	-36.47%
MMLU - redux	5 - shot	61.6	67.9	59.54	-3.34%	-12.31%
BBH	3 - shot	57.7	68.2	39.28	-31.92%	-42.40%
ARC - C	25 - shot	59.3	68.2	75.08	+26.61%	+10.09%
TruthfulQA	0 - shot	44	45.3	41.55	-5.56%	-8.27%
Winogrande	5 - shot	77.4	79.5	67.09	-13.32%	-15.61%
HellaSwag	10 - shot	82.1	81.9	69.88	-14.88%	-14.68%
GPQA	5 - shot	25.8	32.8	29.24	+13.33%	-10.85%
TheoremQA	5 - shot	22.1	28.9	-	-	-
MATH	4 - shot	20.5	37.7	40.2	+96.10%	+6.63%
MMLU - stem	5 - shot	55.3	65.1	52.9	-4.34%	-18.74%
GSM8K	4 - shot	55.3	70.7	75.2	+35.99%	+6.36%
HumanEval	0 - shot	33.5	37.8	68.3	+103.88%	+80.69%
HumanEval+	0 - shot	29.3	30.5	62.2	+112.29%	+103.93%
MBPP	0 - shot	53.9	62.2	60.3	+11.87%	-3.05%
MBPP+	0 - shot	44.4	50.6	50.8	+14.41%	+0.40%
MultiPL - E	0 - shot	22.6	34.9	-	-	-
					Average	+19.27%

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Motif-Technologies/Motif-2.6B",
    trust_remote_code = True, 
    _attn_implementation = "eager", # also supports flash_attention_2
).cuda()

tokenizer = AutoTokenizer.from_pretrained(
    "Motif-Technologies/Motif-2.6B", 
    trust_remote_code = True, 
)

query = "What is the capital city of South Korea?"
input_ids = tokenizer.apply_chat_template(
    [
        {'role': 'system', 'content': 'you are an helpful assistant'},
        {'role': 'user', 'content': query},
    ],
    add_generation_prompt = True,
    return_tensors='pt',
).cuda()

output = model.generate(input_ids, max_new_tokens=128, pad_token_id=tokenizer.eos_token_id)
output = tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens = True)
print(output)

"""
The capital city of South Korea is Seoul. Located in the southern part of the country, Seoul is not only the largest city in South Korea but also one of the largest metropolitan areas in the world.
It is a vibrant and dynamic city known for its rich history, cultural heritage, and modern amenities. Seoul is a major economic, cultural, and political center in East Asia, and it plays a crucial role in the region's politics, economy, and culture.
The city is divided into different administrative districts, each with its own unique characteristics and attractions.
"""

📄 License

License: other
License Name: motif - license
License Link: LICENSE

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご