๐ Midm-2.0-Base-Instruct GGUF Models
This project offers the Midm-2.0-Base-Instruct GGUF models, which are designed for text generation tasks. These models are developed with a focus on specific quantization techniques and provide high - performance in relevant evaluations.
๐ Quick Start
Here is the code snippet to run conversational inference with the model:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
model_name = "K-intelligence/Midm-2.0-Base-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
generation_config = GenerationConfig.from_pretrained(model_name)
prompt = "KT์ ๋ํด ์๊ฐํด์ค"
messages = [
{"role": "system",
"content": "Mi:dm(๋ฏฟ:์)์ KT์์ ๊ฐ๋ฐํ AI ๊ธฐ๋ฐ ์ด์์คํดํธ์ด๋ค."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to("cuda"),
generation_config=generation_config,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))
โ ๏ธ Important Note
The transformers
library should be version 4.45.0
or higher.
โจ Features
- Korea - centric AI: Mi:dm 2.0 is a "Korea - centric AI" model that deeply internalizes the unique values, cognitive frameworks, and commonsense reasoning of Korean society.
- Two Versions: It is released in two versions, the 11.5B parameter dense Mi:dm 2.0 Base for balanced performance and the 2.3B parameter dense Mi:dm 2.0 Mini for on - device and limited - GPU environments.
- New Quantization Approach: A new quantization approach is used to selectively elevate the precision of key layers, improving precision for a given quantization level.
๐ฆ Installation
To serve Mi:dm 2.0 using vLLM(>=0.8.0
) with an OpenAI - compatible API:
vllm serve K - intelligence/Midm - 2.0 - Base - Instruct
๐ป Usage Examples
Run on Friendli.AI
You can try our model immediately via Friendli.AI
. Simply click Deploy
and then Friendli Endpoints
.
โ ๏ธ Important Note
Please note that a login to Friendli.AI
is required after your fifth chat interaction.
Run on Your Local Machine
We provide a detailed description about running Mi:dm 2.0 on your local machine using llama.cpp, LM Studio, and Ollama. Please check our github for more information
Deployment
vllm serve K - intelligence/Midm - 2.0 - Base - Instruct
Tutorials
To help our end - users easily use Mi:dm 2.0, we have provided comprehensive tutorials on github.
๐ Documentation
Model Generation Details
This model was generated using llama.cpp at commit 21c02174
.
Quantization Beyond the IMatrix
I've been experimenting with a new quantization approach that selectively elevates the precision of key layers beyond what the default IMatrix configuration provides.
In my testing, standard IMatrix quantization underperforms at lower bit depths, especially with Mixture of Experts (MoE) models. To address this, I'm using the --tensor - type
option in llama.cpp
to manually "bump" important layers to higher precision. You can see the implementation here:
๐ [Layer bumping with llama.cpp](https://github.com/Mungert69/GGUFModelBuilder/blob/main/model - converter/tensor_list_builder.py)
While this does increase model file size, it significantly improves precision for a given quantization level.
Evaluation
Korean
Model |
Society & Culture (K - Refer, K - Refer - Hard, Ko - Sovereign, HAERAE, Avg.) |
General Knowledge (KMMLU, Ko - Sovereign, Avg.) |
Instruction Following (Ko - IFEval, Ko - MTBench, Avg.) |
Comprehension (K - Prag, K - Refer - Hard, Ko - Best, Ko - Sovereign, Avg.) |
Reasoning (Ko - Winogrande, Ko - Best, LogicKor, HRM8K, Avg.) |
Qwen3 - 4B |
53.6, 42.9, 35.8, 50.6, 45.7 |
50.6, 42.5, 46.5 |
75.9, 63.0, 69.4 |
73.9, 56.7, 91.5, 43.5, 66.6 |
67.5, 69.2, 5.6, 56.7, 43.8 |
Exaone - 3.5 - 2.4B - inst |
64.0, 67.1, 44.4, 61.3, 59.2 |
43.5, 42.4, 43.0 |
65.4, 74.0, 68.9 |
68.7, 58.5, 87.2, 38.0, 62.5 |
60.3, 64.1, 7.4, 38.5, 36.7 |
Mi:dm 2.0 - Mini - inst |
66.4, 61.4, 36.7, 70.8, 58.8 |
45.1, 42.4, 43.8 |
73.3, 74.0, 73.6 |
69.5, 55.4, 80.5, 42.5, 61.9 |
61.7, 64.5, 7.7, 39.9, 37.4 |
Qwen3 - 14B |
72.4, 65.7, 49.8, 68.4, 64.1 |
55.4, 54.7, 55.1 |
83.6, 71, 77.3 |
86.7, 74.0, 93.9, 52.0, 76.8 |
77.2, 75.4, 6.4, 64.5, 48.8 |
Llama - 3.1 - 8B - inst |
43.2, 36.4, 33.8, 49.5, 40.7 |
33.0, 36.7, 34.8 |
60.1, 57, 58.5 |
59.9, 48.6, 77.4, 31.5, 51.5 |
40.1, 26.0, 2.4, 30.9, 19.8 |
Exaone - 3.5 - 7.8B - inst |
71.6, 69.3, 46.9, 72.9, 65.2 |
52.6, 45.6, 49.1 |
69.1, 79.6, 74.4 |
73.5, 61.9, 92.0, 44.0, 67.2 |
64.6, 60.3, 8.6, 49.7, 39.5 |
Mi:dm 2.0 - Base - inst |
89.6, 86.4, 56.3, 81.5, 78.4 |
57.3, 58.0, 57.7 |
82, 89.7, 85.9 |
86.5, 70.8, 95.2, 53.0, 76.1 |
75.1, 73.0, 8.6, 52.9, 44.8 |
English
Model |
Instruction (IFEval) |
Reasoning (BBH, GPQA, MuSR, Avg.) |
Math (GSM8K) |
Coding (MBPP+) |
General Knowledge (MMLU - pro, MMLU, Avg.) |
Qwen3 - 4B |
79.7 |
79.0, 39.8, 58.5, 59.1 |
90.4 |
62.4 |
-, 73.3, 73.3 |
Exaone - 3.5 - 2.4B - inst |
81.1 |
46.4, 28.1, 49.7, 41.4 |
82.5 |
59.8 |
-, 59.5, 59.5 |
Mi:dm 2.0 - Mini - inst |
73.6 |
44.5, 26.6, 51.7, 40.9 |
83.1 |
60.9 |
-, 56.5, 56.5 |
Qwen3 - 14B |
83.9 |
83.4, 49.8, 57.7, 63.6 |
88.0 |
73.4 |
70.5, 82.7, 76.6 |
Llama - 3.1 - 8B - inst |
79.9 |
60.3, 21.6, 50.3, 44.1 |
81.2 |
81.8 |
47.6, 70.7, 59.2 |
Exaone - 3.5 - 7.8B - inst |
83.6 |
50.1, 33.1, 51.2, 44.8 |
81.1 |
79.4 |
40.7, 69.0, 54.8 |
Mi:dm 2.0 - Base - inst |
84.0 |
77.7, 33.5, 51.9, 54.4 |
91.6 |
77.5 |
53.3, 73.7, 63.5 |
๐ง Technical Details
The model is developed with a focus on specific quantization techniques and is based on the llama.cpp framework. The new quantization approach helps to improve the precision of the model, especially for Mixture of Experts (MoE) models.
๐ License
Mi:dm 2.0 is licensed under the MIT License.
๐ If you find these models useful
Help me test my AI - Powered Quantum Network Monitor Assistant with quantum - ready security checks:
๐ Quantum Network Monitor
The full Open Source Code for the Quantum Network Monitor Service available at my github repos (repos with NetworkMonitor in the name): Source Code Quantum Network Monitor. You will also find the code I use to quantize the models if you want to do it yourself GGUFModelBuilder
๐ฌ How to test:
Choose an AI assistant type:
TurboLLM
(GPT - 4.1 - mini)
HugLLM
(Hugginface Open - source models)
TestLLM
(Experimental CPU - only)
What Iโm Testing
Iโm pushing the limits of small open - source models for AI network monitoring, specifically:
- Function calling against live network services
- How small can a model go while still handling:
- Automated Nmap security scans
- Quantum - readiness checks
- Network Monitoring tasks
๐ก TestLLM โ Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space):
- โ
Zero - configuration setup
- โณ 30s load time (slow inference but no API costs). No token limited as the cost is low.
- ๐ง Help wanted! If youโre into edge - device AI, letโs collaborate