Midm-2.0-Base-Instruct-GGUF: An Open-Source AI Model from South Korea - Internalizing Korean Values to Provide Efficient Text Generation Services

Midm 2.0 Base Instruct GGUF

Developed by Mungert

Mi:dm 2.0 is a Korea-centered AI model developed by KT, which deeply internalizes the unique values and cognitive frameworks of Korean society and provides efficient text generation services.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Korean optimization #Cultural adaptation reasoning #Low-bit quantization

Downloads 588

Release Time : 7/16/2025

Model Overview

A text generation model based on the transformers library, focusing on Korean language and cultural understanding, suitable for various text generation tasks.

Model Features

Korea-centered AI model

Deeply internalizes the unique values, cognitive frameworks, and common-sense reasoning of Korean society, and can better understand and reflect the social and cultural norms and values of Korean society.

Multiple versions released

Two versions, Mi:dm 2.0 Base and Mi:dm 2.0 Mini, are provided, which are suitable for different application scenarios respectively to meet diverse needs.

Quantization optimization

Adopts a new quantization method to selectively improve the precision of key layers and improve the model's performance at low-bit depths.

Model Capabilities

Korean text generation

English text generation

Instruction following

Common-sense reasoning

Social and cultural understanding

Use Cases

AI assistant

Q&A related to Korean culture

Answer questions about Korean social culture and values

Provide accurate answers that comply with Korean social and cultural norms

Text generation

Korean content creation

Generate text content that conforms to Korean language habits

Smooth and natural Korean text output

🚀 Midm-2.0-Base-Instruct GGUF Models

This project offers the Midm-2.0-Base-Instruct GGUF models, which are designed for text generation tasks. These models are developed with a focus on specific quantization techniques and provide high - performance in relevant evaluations.

🚀 Quick Start

Here is the code snippet to run conversational inference with the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model_name = "K-intelligence/Midm-2.0-Base-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
generation_config = GenerationConfig.from_pretrained(model_name)

prompt = "KT에 대해 소개해줘"

# message for inference
messages = [
    {"role": "system", 
     "content": "Mi:dm(믿:음)은 KT에서 개발한 AI 기반 어시스턴트이다."},
    {"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to("cuda"),
    generation_config=generation_config,
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

⚠️ Important Note

The transformers library should be version 4.45.0 or higher.

✨ Features

Korea - centric AI: Mi:dm 2.0 is a "Korea - centric AI" model that deeply internalizes the unique values, cognitive frameworks, and commonsense reasoning of Korean society.
Two Versions: It is released in two versions, the 11.5B parameter dense Mi:dm 2.0 Base for balanced performance and the 2.3B parameter dense Mi:dm 2.0 Mini for on - device and limited - GPU environments.
New Quantization Approach: A new quantization approach is used to selectively elevate the precision of key layers, improving precision for a given quantization level.

📦 Installation

To serve Mi:dm 2.0 using vLLM(>=0.8.0) with an OpenAI - compatible API:

vllm serve K - intelligence/Midm - 2.0 - Base - Instruct

💻 Usage Examples

Run on Friendli.AI

You can try our model immediately via Friendli.AI. Simply click Deploy and then Friendli Endpoints.

⚠️ Important Note

Please note that a login to Friendli.AI is required after your fifth chat interaction.

Left Image Right Image

Run on Your Local Machine

We provide a detailed description about running Mi:dm 2.0 on your local machine using llama.cpp, LM Studio, and Ollama. Please check our github for more information

Deployment

vllm serve K - intelligence/Midm - 2.0 - Base - Instruct

Tutorials

To help our end - users easily use Mi:dm 2.0, we have provided comprehensive tutorials on github.

📚 Documentation

Model Generation Details

This model was generated using llama.cpp at commit 21c02174.

Quantization Beyond the IMatrix

I've been experimenting with a new quantization approach that selectively elevates the precision of key layers beyond what the default IMatrix configuration provides.

In my testing, standard IMatrix quantization underperforms at lower bit depths, especially with Mixture of Experts (MoE) models. To address this, I'm using the --tensor - type option in llama.cpp to manually "bump" important layers to higher precision. You can see the implementation here:
👉 [Layer bumping with llama.cpp](https://github.com/Mungert69/GGUFModelBuilder/blob/main/model - converter/tensor_list_builder.py)

While this does increase model file size, it significantly improves precision for a given quantization level.

Evaluation

Korean

Model	Society & Culture (K - Refer, K - Refer - Hard, Ko - Sovereign, HAERAE, Avg.)	General Knowledge (KMMLU, Ko - Sovereign, Avg.)	Instruction Following (Ko - IFEval, Ko - MTBench, Avg.)	Comprehension (K - Prag, K - Refer - Hard, Ko - Best, Ko - Sovereign, Avg.)	Reasoning (Ko - Winogrande, Ko - Best, LogicKor, HRM8K, Avg.)
Qwen3 - 4B	53.6, 42.9, 35.8, 50.6, 45.7	50.6, 42.5, 46.5	75.9, 63.0, 69.4	73.9, 56.7, 91.5, 43.5, 66.6	67.5, 69.2, 5.6, 56.7, 43.8
Exaone - 3.5 - 2.4B - inst	64.0, 67.1, 44.4, 61.3, 59.2	43.5, 42.4, 43.0	65.4, 74.0, 68.9	68.7, 58.5, 87.2, 38.0, 62.5	60.3, 64.1, 7.4, 38.5, 36.7
Mi:dm 2.0 - Mini - inst	66.4, 61.4, 36.7, 70.8, 58.8	45.1, 42.4, 43.8	73.3, 74.0, 73.6	69.5, 55.4, 80.5, 42.5, 61.9	61.7, 64.5, 7.7, 39.9, 37.4
Qwen3 - 14B	72.4, 65.7, 49.8, 68.4, 64.1	55.4, 54.7, 55.1	83.6, 71, 77.3	86.7, 74.0, 93.9, 52.0, 76.8	77.2, 75.4, 6.4, 64.5, 48.8
Llama - 3.1 - 8B - inst	43.2, 36.4, 33.8, 49.5, 40.7	33.0, 36.7, 34.8	60.1, 57, 58.5	59.9, 48.6, 77.4, 31.5, 51.5	40.1, 26.0, 2.4, 30.9, 19.8
Exaone - 3.5 - 7.8B - inst	71.6, 69.3, 46.9, 72.9, 65.2	52.6, 45.6, 49.1	69.1, 79.6, 74.4	73.5, 61.9, 92.0, 44.0, 67.2	64.6, 60.3, 8.6, 49.7, 39.5
Mi:dm 2.0 - Base - inst	89.6, 86.4, 56.3, 81.5, 78.4	57.3, 58.0, 57.7	82, 89.7, 85.9	86.5, 70.8, 95.2, 53.0, 76.1	75.1, 73.0, 8.6, 52.9, 44.8

English

Model	Instruction (IFEval)	Reasoning (BBH, GPQA, MuSR, Avg.)	Math (GSM8K)	Coding (MBPP+)	General Knowledge (MMLU - pro, MMLU, Avg.)
Qwen3 - 4B	79.7	79.0, 39.8, 58.5, 59.1	90.4	62.4	-, 73.3, 73.3
Exaone - 3.5 - 2.4B - inst	81.1	46.4, 28.1, 49.7, 41.4	82.5	59.8	-, 59.5, 59.5
Mi:dm 2.0 - Mini - inst	73.6	44.5, 26.6, 51.7, 40.9	83.1	60.9	-, 56.5, 56.5
Qwen3 - 14B	83.9	83.4, 49.8, 57.7, 63.6	88.0	73.4	70.5, 82.7, 76.6
Llama - 3.1 - 8B - inst	79.9	60.3, 21.6, 50.3, 44.1	81.2	81.8	47.6, 70.7, 59.2
Exaone - 3.5 - 7.8B - inst	83.6	50.1, 33.1, 51.2, 44.8	81.1	79.4	40.7, 69.0, 54.8
Mi:dm 2.0 - Base - inst	84.0	77.7, 33.5, 51.9, 54.4	91.6	77.5	53.3, 73.7, 63.5

🔧 Technical Details

The model is developed with a focus on specific quantization techniques and is based on the llama.cpp framework. The new quantization approach helps to improve the precision of the model, especially for Mixture of Experts (MoE) models.

📄 License

Mi:dm 2.0 is licensed under the MIT License.

📚 If you find these models useful

Help me test my AI - Powered Quantum Network Monitor Assistant with quantum - ready security checks:

👉 Quantum Network Monitor

The full Open Source Code for the Quantum Network Monitor Service available at my github repos (repos with NetworkMonitor in the name): Source Code Quantum Network Monitor. You will also find the code I use to quantize the models if you want to do it yourself GGUFModelBuilder

💬 How to test:
Choose an AI assistant type:

TurboLLM (GPT - 4.1 - mini)
HugLLM (Hugginface Open - source models)
TestLLM (Experimental CPU - only)

What I’m Testing

I’m pushing the limits of small open - source models for AI network monitoring, specifically:

Function calling against live network services
How small can a model go while still handling:
- Automated Nmap security scans
- Quantum - readiness checks
- Network Monitoring tasks

🟡 TestLLM – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space):

✅ Zero - configuration setup
⏳ 30s load time (slow inference but no API costs). No token limited as the cost is low.
🔧 Help wanted! If you’re into edge - device AI, let’s collaborate