模型概述
模型特點
模型能力
使用案例
🚀 GLM-4-32B-0414
GLM-4-32B-0414系列模型是GLM家族的新成員,擁有320億參數。其性能可與OpenAI的GPT系列以及DeepSeek的V3/R1系列相媲美,還支持非常便捷的本地部署。該模型在多個領域表現出色,為用戶提供強大的文本處理能力。
✨ 主要特性
- 高性能表現:GLM-4-32B-Base-0414在15T高質量數據上進行預訓練,其中包含大量推理類型的合成數據,為後續的強化學習擴展奠定了基礎。在多個基準測試中,如代碼生成或特定問答任務,GLM-4-32B-Base-0414與GPT-4o和DeepSeek-V3-0324(671B)等更大的模型表現相當。
- 多模型變體:除了基礎模型,還包括具有深度思維能力的推理模型GLM-Z1-32B-0414、具有沉思能力的深度推理模型GLM-Z1-Rumination-32B-0414,以及在數學推理和通用任務中表現出色的小模型GLM-Z1-9B-0414。
- 多樣化應用場景:在動畫生成、網頁設計、SVG生成、基於搜索的寫作和函數調用等方面都有良好的表現。
📦 安裝指南
文檔未提及安裝步驟,故跳過此章節。
💻 使用示例
基礎用法
在基於搜索的寫作任務中,使用以下系統提示讓模型根據搜索結果進行回覆:
請根據所給搜索返回結果對用戶問題進行作答。
## 注意
1. 充分利用和整理收集到的信息,而不是簡單的複製粘貼,生成符合用戶要求且有深度的專業答案。
2. 所提供信息充分的情況下,你的回答需儘可能延長,從用戶意圖角度出發,提供具有足夠信息量和多角度的回覆。
3. 另外,並非所有的搜索結果都與用戶問題密切相關,請仔細的甄別、篩選和利用。
4. 客觀類問答的答案通常非常簡短,你可以適當補充一到兩句相關信息,以豐富內容。
5. 請確保你的回覆格式美觀、可讀性強。對於多實體對比或列舉,善用列表格式來幫助用戶更好的理解信息。
6. 除非用戶要求,否則你回答的語言請於用戶提問語言保持一致。
7. 在適當情況下在句子末尾使用例如:【0†source】的格式引用搜索結果。
使用時,可以通過RAG
或WebSearch
等方法獲取搜索結果,並將其包裝在observation
中,例如:
[
{
"role": "user",
"content": "Explore the common characteristics of children's literature, with a focus on its narrative techniques and thematic tendencies. This includes narrative techniques: common approaches in children's literature such as first-person, third-person, omniscient narrator, and interactive narration, and their influence on young readers. It also includes thematic tendencies: recurring themes in children's literature such as growth, adventure, friendship, and family, with an analysis of how these themes impact children's cognitive and emotional development. Additionally, other universal features such as the use of personification, repetitive language, symbolism and metaphor, and educational value should be considered. Please provide a detailed analytical report based on academic research, classic examples of children's literature, and expert opinions."
},
{
"role": "observation",
"content": "【{id}†{title}†{url}】\n{content}"
},
...
]
高級用法
GLM-4-32B-0414支持以JSON格式調用外部工具,以下是使用HuggingFace Transformers實現工具調用和最終回覆生成的示例代碼:
import json
import re
import ast
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = "THUDM/GLM-4-32B-0414"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
def is_function_call(single_message):
"""Determine whether the current system message is a function call."""
pattern = re.compile(r'([^\n`]*?)\n({.*?})(?=\w*\n|$)', re.DOTALL)
matches = pattern.findall(single_message)
if not matches:
return False
func_name, args_str = matches[0]
func_name = func_name.strip()
try:
parsed_args = json.loads(args_str)
except json.JSONDecodeError:
try:
parsed_args = ast.literal_eval(args_str)
except:
return False
return {"name": func_name, "arguments": parsed_args}
def realtime_aqi(city):
"""Weather Query Tool"""
if '北京' in city.lower():
return json.dumps({'city': '北京', 'aqi': '10', 'unit': 'celsius'}, ensure_ascii=False)
elif '上海' in city.lower():
return json.dumps({'city': '上海', 'aqi': '72', 'unit': 'fahrenheit'}, ensure_ascii=False)
else:
return json.dumps({'city': city, 'aqi': 'unknown'}, ensure_ascii=False)
def build_system_prompt(tools):
"""Construct system prompt based on the list of available tools."""
if tools is None:
tools = []
value = "# 可用工具"
contents = []
for tool in tools:
content = f"\n\n## {tool['function']['name']}\n\n{json.dumps(tool['function'], ensure_ascii=False, indent=4)}"
content += "\n在調用上述函數時,請使用 Json 格式表示調用的參數。"
contents.append(content)
value += "".join(contents)
return value
tools = [
{
"type": "function",
"function": {
"name": "realtime_aqi",
"description": "天氣預報。獲取即時空氣質量。當前空氣質量,PM2.5,PM10信息",
"parameters": {
"type": "object",
"properties": {
"city": {
"description": "城市名"
}
},
"required": [
"city"
]
}
}
}
]
system_prompt = build_system_prompt(tools)
message = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "北京和上海今天的天氣情況"}
]
print(f"User Message: {message[-1]['content']}")
while True:
inputs = tokenizer.apply_chat_template(
message,
return_tensors="pt",
add_generation_prompt=True,
return_dict=True,
).to(model.device)
generate_kwargs = {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"],
"max_new_tokens": 1024,
"do_sample": True,
}
out = model.generate(**generate_kwargs)
generate_resp = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:-1], skip_special_tokens=False)
stop_sequence = tokenizer.decode(out[0][-1:], skip_speical_tokens=False)
if stop_sequence == "<|user|>":
print(f"Assistant Response: {generate_resp.strip()}")
break
function_calls = []
for m in generate_resp.split("<|assistant|>"):
fc_decode = is_function_call(m.strip())
if fc_decode:
message.append({"role": "assistant", "metadata": fc_decode['name'], "content": json.dumps(fc_decode['arguments'], ensure_ascii=False)})
print(f"Function Call: {fc_decode}")
function_calls.append(fc_decode)
else:
message.append({"role": "assistant", "content": m})
print(f"Assistant Response: {m.strip()}")
for fc in function_calls:
function_response = realtime_aqi(
city=fc["arguments"]["city"],
)
print(f"Function Response: {function_response}")
message.append({"role": "observation", "content": function_response})
📚 詳細文檔
模型展示
動畫生成
模型 | 示例視頻 | 提示內容 |
---|---|---|
GLM-Z1-32B-0414 | write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically | |
GLM-4-32B-0414 | Use HTML to simulate the scenario of a small ball released from the center of a rotating hexagon. Consider the collision between the ball and the hexagon's edges, the gravity acting on the ball, and assume all collisions are perfectly elastic. (Prompt translated from Chinese) |
網頁設計
模型 | 示例圖片 | 提示內容 |
---|---|---|
GLM-4-32B-0414 | Design a drawing board that supports custom function plotting, allowing adding and deleting custom functions, and assigning colors to functions. (Prompt translated from Chinese) | |
GLM-4-32B-0414 | Design a UI for a mobile machine learning platform, which should include interfaces for training tasks, storage management, and personal statistics. The personal statistics interface should use charts to display the user's resource usage over a period. Use Tailwind CSS to style the page, and display these 3 mobile interfaces tiled on a single HTML page. (Prompt translated from Chinese) |
SVG生成
模型 | 示例圖片 | 提示內容 |
---|---|---|
GLM-4-32B-0414 | Create a misty Jiangnan scene using SVG. (Prompt translated from Chinese) | |
GLM-4-32B-0414 | Use SVG to illustrate the training process of an LLM. (Prompt translated from Chinese) |
基於搜索的寫作
對於基於搜索的寫作任務,使用特定的系統提示讓模型根據搜索結果進行回覆,並給出了詳細的注意事項和使用示例。
評估結果
GLM-4-0414系列模型評估
模型 | IFEval | BFCL-v3 (Overall) | BFCL-v3 (MultiTurn) | TAU-Bench (Retail) | TAU-Bench (Airline) | SimpleQA | HotpotQA |
---|---|---|---|---|---|---|---|
Qwen2.5-Max | 85.6 | 50.9 | 30.5 | 58.3 | 22.0 | 79.0 | 52.8 |
GPT-4o-1120 | 81.9 | 69.6 | 41.0 | 62.8 | 46.0 | 82.8 | 63.9 |
DeepSeek-V3-0324 | 83.4 | 66.2 | 35.8 | 60.7 | 32.4 | 82.6 | 54.6 |
DeepSeek-R1 | 84.3 | 57.5 | 12.4 | 33.0 | 37.3 | 83.9 | 63.1 |
GLM-4-32B-0414 | 87.6 | 69.6 | 41.5 | 68.7 | 51.2 | 88.1 | 63.8 |
對於
SimpleQA
和HotpotQA
,從每個測試集中抽取近500個測試用例,為所有模型提供基本的search
和click
工具,確保其他設置一致,並進行3次運行取平均值。
不同框架下的評估
模型 | 框架 | SWE-bench Verified | SWE-bench Verified mini |
---|---|---|---|
GLM-4-32B-0414 | Moatless[1] | 33.8 | 38.0 |
GLM-4-32B-0414 | Agentless[2] | 30.7 | 34.0 |
GLM-4-32B-0414 | OpenHands[3] | 27.2 | 28.0 |
[1] Moatless v0.0.3使用以下參數:response_format="react", thoughts_in_action=False, max_interations=30
。失敗軌跡不重試;其他設置為默認。
[2] Agentless v1.5.0使用BGE作為嵌入模型,使用FAISS進行相似度搜索。為了在保持性能的同時加快補丁驗證速度,將單個實例的運行超時時間從默認的300s改為180s。
[3] OpenHands v0.29.1未使用YaRN上下文擴展,但將運行限制在最多60次迭代,並對歷史記錄進行總結,以防止超過32K上下文限制。總結配置為llm_config="condenser", keep_first=1, max_size=32
。失敗軌跡不重試。
🔧 技術細節
文檔未提供具體的技術實現細節,故跳過此章節。
📄 許可證
本項目採用MIT許可證。



