EXAONE-4.0-1.2B開源多語言大模型 - 支持多語，具備智能體工具調用能力

首頁

EXAONE 4.0 1.2B

由LGAI-EXAONE開發

EXAONE 4.0 是一個集成了非推理模式和推理模式的多語言大語言模型，支持英語、韓語和西班牙語，具備智能體工具調用能力。

大型語言模型

Transformers

支持多種語言開源協議:其他 #混合推理模式 #多語言智能體 #工具調用能力

下載量 189

發布時間 : 7/11/2025

模型概述

EXAONE 4.0 是 LG AI Research 開發的大語言模型，融合了 EXAONE 3.5 的易用性和 EXAONE Deep 的高級推理能力，支持多語言和智能體工具使用。

模型特點

雙模式集成

同時支持非推理模式和推理模式，兼顧易用性和高級推理能力。

多語言支持

支持英語、韓語和西班牙語，具備跨語言處理能力。

智能體工具使用

具備工具調用能力，可通過提供工具模式實現有效調用。

混合注意力機制

採用局部注意力與全局注意力結合的混合方案，提升上下文理解能力。

模型能力

文本生成

複雜問題推理

多語言處理

工具調用

長上下文處理

使用案例

通用問答

多語言問答

使用英語、韓語或西班牙語回答用戶問題。

數學推理

數值比較

比較兩個數值的大小並解釋原因。

智能體應用

骰子模擬

通過工具調用模擬擲骰子的結果。

🚀 EXAONE-4.0-1.2B

EXAONE 4.0 集成了 非推理模式 和 推理模式，兼具 EXAONE 3.5 的出色易用性和 EXAONE Deep 的高級推理能力。為邁向智能體 AI 時代，EXAONE 4.0 融入了智能體工具使用等關鍵特性，並且在多語言支持方面，除英語和韓語外，還新增了西班牙語支持。

EXAONE 4.0 模型系列有兩種規模：為高性能優化的中型 32B 模型，以及專為設備端應用設計的小型 1.2B 模型。

🎉 許可證更新！我們很高興宣佈推出更靈活的許可條款 🤗
✈️ 前往 FriendliAI 進行嘗試

🚀 快速開始

你需要安裝從原始版本派生的 transformers 庫，可在我們的 PR 中獲取。一旦該 PR 合併併發布，我們將更新此部分內容。

你可以通過以下命令安裝支持 EXAONE 4.0 的最新版本 transformers：

pip install git+https://github.com/lgai-exaone/transformers@add-exaone4

非推理模式

對於一般使用場景，你可以使用以下示例代碼調用 EXAONE 4.0 模型：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-4.0-1.2B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="bfloat16",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 選擇你的提示語
prompt = "Explain how wonderful you are"
prompt = "Explica lo increíble que eres"
prompt = "너가 얼마나 대단한지 설명해 봐"

messages = [
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

推理模式

EXAONE 4.0 模型具備處理複雜問題的推理能力。你可以通過在 tokenizer 中使用 enable_thinking=True 參數來激活推理模式，該參數會開啟一個以 <think> 標籤開頭但不關閉的推理塊。

messages = [
    {"role": "user", "content": "Which one is bigger, 3.12 vs 3.9?"}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=True,
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=128,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
print(tokenizer.decode(output[0]))

⚠️ 重要提示

推理模式下的模型生成結果可能會受到採樣參數的敏感影響，請參考使用指南以獲得更好的生成質量。

智能體工具使用

EXAONE 4.0 模型可以作為智能體使用，具備工具調用能力。你可以向模型提供工具模式以實現有效的工具調用。

import random

def roll_dice(max_num: int):
    return random.randint(1, max_num)

tools = [
    {
        "type": "function",
        "function": {
            "name": "roll_dice",
            "description": "Roll a dice with the number 1 to N. User can select the number N.",
            "parameters": {
                "type": "object",
                "required": ["max_num"],
                "properties": {
                    "max_num": {
                        "type": "int",
                        "description": "Max number of the dice"
                    }
                }
            }
        }
    }
]

messages = [
    {"role": "user", "content": "Roll D6 dice twice!"}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    tools=tools,
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)
print(tokenizer.decode(output[0]))

✨ 主要特性

雙模式集成：融合非推理模式和推理模式，兼顧易用性和高級推理能力。
多語言支持：支持英語、韓語和西班牙語。
智能體工具使用：具備工具調用能力，可通過提供工具模式實現有效調用。

📦 安裝指南

你需要安裝從原始版本派生的 transformers 庫，可在我們的 PR 中獲取。一旦該 PR 合併併發布，我們將更新此部分內容。

你可以通過以下命令安裝支持 EXAONE 4.0 的最新版本 transformers：

pip install git+https://github.com/lgai-exaone/transformers@add-exaone4

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-4.0-1.2B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="bfloat16",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 選擇你的提示語
prompt = "Explain how wonderful you are"
prompt = "Explica lo increíble que eres"
prompt = "너가 얼마나 대단한지 설명해 봐"

messages = [
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

高級用法

推理模式

messages = [
    {"role": "user", "content": "Which one is bigger, 3.12 vs 3.9?"}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    enable_thinking=True,
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=128,
    do_sample=True,
    temperature=0.6,
    top_p=0.95
)
print(tokenizer.decode(output[0]))

智能體工具使用

import random

def roll_dice(max_num: int):
    return random.randint(1, max_num)

tools = [
    {
        "type": "function",
        "function": {
            "name": "roll_dice",
            "description": "Roll a dice with the number 1 to N. User can select the number N.",
            "parameters": {
                "type": "object",
                "required": ["max_num"],
                "properties": {
                    "max_num": {
                        "type": "int",
                        "description": "Max number of the dice"
                    }
                }
            }
        }
    }
]

messages = [
    {"role": "user", "content": "Roll D6 dice twice!"}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    tools=tools,
)

output = model.generate(
    input_ids.to(model.device),
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)
print(tokenizer.decode(output[0]))

📚 詳細文檔

模型配置

屬性	詳情
模型參數數量（不包含嵌入層）	10.7 億
層數	30
注意力頭數量	GQA 32 頭，8 個 KV 頭
詞表大小	102,400
上下文長度	65,536 個詞元

部署

TensorRT-LLM

TensorRT-LLM 的最新提交版本已正式支持 EXAONE 4.0 模型。在其發佈之前，你需要克隆 TensorRT-LLM 倉庫並從源代碼進行構建。

git clone https://github.com/NVIDIA/TensorRT-LLM.git

克隆倉庫後，你需要構建源代碼以完成安裝。請參考官方文檔來構建 TensorRT-LLM 環境。

你可以按照以下步驟運行 TensorRT-LLM 服務器：

編寫額外的配置 YAML 文件

# extra_llm_api_config.yaml
kv_cache_config:
  enable_block_reuse: false

使用配置文件運行服務器

trtllm-serve serve [MODEL_PATH] --backend pytorch --extra_llm_api_options extra_llm_api_config.yaml

更多詳細信息，請參考 TensorRT-LLM 中關於 EXAONE 的文檔。

⚠️ 重要提示

目前包括 vllm 和 sglang 在內的其他推理引擎尚未正式支持 EXAONE 4.0。我們將在這些庫更新後及時進行更新。

性能

以下表格展示了每個模型在推理模式和非推理模式下的評估結果。評估詳情可參考技術報告。

✅ 表示該模型具備混合推理能力，可根據需求選擇推理或非推理模式。
為評估韓語的實用和專業知識，我們採用了 KMMLU-Redux 和 KMMLU-Pro 兩個基準測試集。這兩個數據集均已公開發布！

32B 推理模式

	EXAONE 4.0 32B	Phi 4 reasoning-plus	Magistral Small-2506	Qwen 3 32B	Qwen 3 235B	DeepSeek R1-0528
模型大小	320 億	147 億	236 億	328 億	2350 億	6710 億
混合推理	✅			✅	✅
世界知識
MMLU-Redux	92.3	90.8	86.8	90.9	92.7	93.4
MMLU-Pro	81.8	76.0	73.4	80.0	83.0	85.0
GPQA-Diamond	75.4	68.9	68.2	68.4	71.1	81.0
數學/編程
AIME 2025	85.3	78.0	62.8	72.9	81.5	87.5
HMMT Feb 2025	72.9	53.6	43.5	50.4	62.5	79.4
LiveCodeBench v5	72.6	51.7	55.8	65.7	70.7	75.2
LiveCodeBench v6	66.7	47.1	47.4	60.1	58.9	70.3
指令遵循
IFEval	83.7	84.9	37.9	85.0	83.4	80.8
Multi-IF (EN)	73.5	56.1	27.4	73.4	73.4	72.0
智能體工具使用
BFCL-v3	63.9	N/A	40.4	70.3	70.8	64.7
Tau-bench (Airline)	51.5	N/A	38.5	34.5	37.5	53.5
Tau-bench (Retail)	62.8	N/A	10.2	55.2	58.3	63.9
多語言能力
KMMLU-Pro	67.7	55.8	51.5	61.4	68.1	71.7
KMMLU-Redux	72.7	62.7	54.6	67.5	74.5	77.0
KSM	87.6	79.8	71.9	82.8	86.2	86.7
MMMLU (ES)	85.6	84.3	68.9	82.8	86.7	88.2
MATH500 (ES)	95.8	94.2	83.5	94.3	95.1	96.0

32B 非推理模式

	EXAONE 4.0 32B	Phi 4	Mistral-Small-2506	Gemma 3 27B	Qwen3 32B	Qwen3 235B	Llama-4-Maverick	DeepSeek V3-0324
模型大小	320 億	147 億	240 億	274 億	328 億	2350 億	4020 億	6710 億
混合推理	✅				✅	✅
世界知識
MMLU-Redux	89.8	88.3	85.9	85.0	85.7	89.2	92.3	92.3
MMLU-Pro	77.6	70.4	69.1	67.5	74.4	77.4	80.5	81.2
GPQA-Diamond	63.7	56.1	46.1	42.4	54.6	62.9	69.8	68.4
數學/編程
AIME 2025	35.9	17.8	30.2	23.8	20.2	24.7	18.0	50.0
HMMT Feb 2025	21.8	4.0	16.9	10.3	9.8	11.9	7.3	29.2
LiveCodeBench v5	43.3	24.6	25.8	27.5	31.3	35.3	43.4	46.7
LiveCodeBench v6	43.1	27.4	26.9	29.7	28.0	31.4	32.7	44.0
指令遵循
IFEval	84.8	63.0	77.8	82.6	83.2	83.2	85.4	81.2
Multi-IF (EN)	71.6	47.7	63.2	72.1	71.9	72.5	77.9	68.3
長上下文處理
HELMET	58.3	N/A	61.9	58.3	54.5	63.3	13.7	N/A
RULER	88.2	N/A	71.8	66.0	85.6	90.6	2.9	N/A
LongBench v1	48.1	N/A	51.5	51.5	44.2	45.3	34.7	N/A
智能體工具使用
BFCL-v3	65.2	N/A	57.7	N/A	63.0	68.0	52.9	63.8
Tau-Bench (Airline)	25.5	N/A	36.1	N/A	16.0	27.0	38.0	40.5
Tau-Bench (Retail)	55.9	N/A	35.5	N/A	42.5	52.0	38.0	55.0

🔧 技術細節

在 EXAONE 4.0 架構中，與之前的 EXAONE 模型相比，我們進行了如下新的架構更改：

混合注意力機制：對於 32B 模型，我們採用混合注意力方案，將 局部注意力（滑動窗口注意力） 與 全局注意力（全注意力） 以 3:1 的比例結合。為了更好地理解全局上下文，我們在全局注意力中不使用 RoPE（旋轉位置嵌入）。
QK 重排序歸一化：我們在 Transformer 塊中採用 Post-LN（層歸一化）方案，而非 Pre-LN，並在 Q 和 K 投影之後添加 RMS 歸一化。儘管這會消耗更多的計算資源，但有助於在下游任務中獲得更好的性能。

更多詳細信息，請參考我們的技術報告、博客和 GitHub。