模型简介
模型特点
模型能力
使用案例
🚀 GLM-4-32B-0414
GLM-4-32B-0414系列模型是GLM家族的新成员,拥有320亿参数。其性能可与OpenAI的GPT系列以及DeepSeek的V3/R1系列相媲美,还支持非常便捷的本地部署。该模型在多个领域表现出色,为用户提供强大的文本处理能力。
✨ 主要特性
- 高性能表现:GLM-4-32B-Base-0414在15T高质量数据上进行预训练,其中包含大量推理类型的合成数据,为后续的强化学习扩展奠定了基础。在多个基准测试中,如代码生成或特定问答任务,GLM-4-32B-Base-0414与GPT-4o和DeepSeek-V3-0324(671B)等更大的模型表现相当。
- 多模型变体:除了基础模型,还包括具有深度思维能力的推理模型GLM-Z1-32B-0414、具有沉思能力的深度推理模型GLM-Z1-Rumination-32B-0414,以及在数学推理和通用任务中表现出色的小模型GLM-Z1-9B-0414。
- 多样化应用场景:在动画生成、网页设计、SVG生成、基于搜索的写作和函数调用等方面都有良好的表现。
📦 安装指南
文档未提及安装步骤,故跳过此章节。
💻 使用示例
基础用法
在基于搜索的写作任务中,使用以下系统提示让模型根据搜索结果进行回复:
请根据所给搜索返回结果对用户问题进行作答。
## 注意
1. 充分利用和整理收集到的信息,而不是简单的复制粘贴,生成符合用户要求且有深度的专业答案。
2. 所提供信息充分的情况下,你的回答需尽可能延长,从用户意图角度出发,提供具有足够信息量和多角度的回复。
3. 另外,并非所有的搜索结果都与用户问题密切相关,请仔细的甄别、筛选和利用。
4. 客观类问答的答案通常非常简短,你可以适当补充一到两句相关信息,以丰富内容。
5. 请确保你的回复格式美观、可读性强。对于多实体对比或列举,善用列表格式来帮助用户更好的理解信息。
6. 除非用户要求,否则你回答的语言请于用户提问语言保持一致。
7. 在适当情况下在句子末尾使用例如:【0†source】的格式引用搜索结果。
使用时,可以通过RAG
或WebSearch
等方法获取搜索结果,并将其包装在observation
中,例如:
[
{
"role": "user",
"content": "Explore the common characteristics of children's literature, with a focus on its narrative techniques and thematic tendencies. This includes narrative techniques: common approaches in children's literature such as first-person, third-person, omniscient narrator, and interactive narration, and their influence on young readers. It also includes thematic tendencies: recurring themes in children's literature such as growth, adventure, friendship, and family, with an analysis of how these themes impact children's cognitive and emotional development. Additionally, other universal features such as the use of personification, repetitive language, symbolism and metaphor, and educational value should be considered. Please provide a detailed analytical report based on academic research, classic examples of children's literature, and expert opinions."
},
{
"role": "observation",
"content": "【{id}†{title}†{url}】\n{content}"
},
...
]
高级用法
GLM-4-32B-0414支持以JSON格式调用外部工具,以下是使用HuggingFace Transformers实现工具调用和最终回复生成的示例代码:
import json
import re
import ast
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = "THUDM/GLM-4-32B-0414"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
def is_function_call(single_message):
"""Determine whether the current system message is a function call."""
pattern = re.compile(r'([^\n`]*?)\n({.*?})(?=\w*\n|$)', re.DOTALL)
matches = pattern.findall(single_message)
if not matches:
return False
func_name, args_str = matches[0]
func_name = func_name.strip()
try:
parsed_args = json.loads(args_str)
except json.JSONDecodeError:
try:
parsed_args = ast.literal_eval(args_str)
except:
return False
return {"name": func_name, "arguments": parsed_args}
def realtime_aqi(city):
"""Weather Query Tool"""
if '北京' in city.lower():
return json.dumps({'city': '北京', 'aqi': '10', 'unit': 'celsius'}, ensure_ascii=False)
elif '上海' in city.lower():
return json.dumps({'city': '上海', 'aqi': '72', 'unit': 'fahrenheit'}, ensure_ascii=False)
else:
return json.dumps({'city': city, 'aqi': 'unknown'}, ensure_ascii=False)
def build_system_prompt(tools):
"""Construct system prompt based on the list of available tools."""
if tools is None:
tools = []
value = "# 可用工具"
contents = []
for tool in tools:
content = f"\n\n## {tool['function']['name']}\n\n{json.dumps(tool['function'], ensure_ascii=False, indent=4)}"
content += "\n在调用上述函数时,请使用 Json 格式表示调用的参数。"
contents.append(content)
value += "".join(contents)
return value
tools = [
{
"type": "function",
"function": {
"name": "realtime_aqi",
"description": "天气预报。获取实时空气质量。当前空气质量,PM2.5,PM10信息",
"parameters": {
"type": "object",
"properties": {
"city": {
"description": "城市名"
}
},
"required": [
"city"
]
}
}
}
]
system_prompt = build_system_prompt(tools)
message = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "北京和上海今天的天气情况"}
]
print(f"User Message: {message[-1]['content']}")
while True:
inputs = tokenizer.apply_chat_template(
message,
return_tensors="pt",
add_generation_prompt=True,
return_dict=True,
).to(model.device)
generate_kwargs = {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"],
"max_new_tokens": 1024,
"do_sample": True,
}
out = model.generate(**generate_kwargs)
generate_resp = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:-1], skip_special_tokens=False)
stop_sequence = tokenizer.decode(out[0][-1:], skip_speical_tokens=False)
if stop_sequence == "<|user|>":
print(f"Assistant Response: {generate_resp.strip()}")
break
function_calls = []
for m in generate_resp.split("<|assistant|>"):
fc_decode = is_function_call(m.strip())
if fc_decode:
message.append({"role": "assistant", "metadata": fc_decode['name'], "content": json.dumps(fc_decode['arguments'], ensure_ascii=False)})
print(f"Function Call: {fc_decode}")
function_calls.append(fc_decode)
else:
message.append({"role": "assistant", "content": m})
print(f"Assistant Response: {m.strip()}")
for fc in function_calls:
function_response = realtime_aqi(
city=fc["arguments"]["city"],
)
print(f"Function Response: {function_response}")
message.append({"role": "observation", "content": function_response})
📚 详细文档
模型展示
动画生成
模型 | 示例视频 | 提示内容 |
---|---|---|
GLM-Z1-32B-0414 | write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically | |
GLM-4-32B-0414 | Use HTML to simulate the scenario of a small ball released from the center of a rotating hexagon. Consider the collision between the ball and the hexagon's edges, the gravity acting on the ball, and assume all collisions are perfectly elastic. (Prompt translated from Chinese) |
网页设计
模型 | 示例图片 | 提示内容 |
---|---|---|
GLM-4-32B-0414 | Design a drawing board that supports custom function plotting, allowing adding and deleting custom functions, and assigning colors to functions. (Prompt translated from Chinese) | |
GLM-4-32B-0414 | Design a UI for a mobile machine learning platform, which should include interfaces for training tasks, storage management, and personal statistics. The personal statistics interface should use charts to display the user's resource usage over a period. Use Tailwind CSS to style the page, and display these 3 mobile interfaces tiled on a single HTML page. (Prompt translated from Chinese) |
SVG生成
模型 | 示例图片 | 提示内容 |
---|---|---|
GLM-4-32B-0414 | Create a misty Jiangnan scene using SVG. (Prompt translated from Chinese) | |
GLM-4-32B-0414 | Use SVG to illustrate the training process of an LLM. (Prompt translated from Chinese) |
基于搜索的写作
对于基于搜索的写作任务,使用特定的系统提示让模型根据搜索结果进行回复,并给出了详细的注意事项和使用示例。
评估结果
GLM-4-0414系列模型评估
模型 | IFEval | BFCL-v3 (Overall) | BFCL-v3 (MultiTurn) | TAU-Bench (Retail) | TAU-Bench (Airline) | SimpleQA | HotpotQA |
---|---|---|---|---|---|---|---|
Qwen2.5-Max | 85.6 | 50.9 | 30.5 | 58.3 | 22.0 | 79.0 | 52.8 |
GPT-4o-1120 | 81.9 | 69.6 | 41.0 | 62.8 | 46.0 | 82.8 | 63.9 |
DeepSeek-V3-0324 | 83.4 | 66.2 | 35.8 | 60.7 | 32.4 | 82.6 | 54.6 |
DeepSeek-R1 | 84.3 | 57.5 | 12.4 | 33.0 | 37.3 | 83.9 | 63.1 |
GLM-4-32B-0414 | 87.6 | 69.6 | 41.5 | 68.7 | 51.2 | 88.1 | 63.8 |
对于
SimpleQA
和HotpotQA
,从每个测试集中抽取近500个测试用例,为所有模型提供基本的search
和click
工具,确保其他设置一致,并进行3次运行取平均值。
不同框架下的评估
模型 | 框架 | SWE-bench Verified | SWE-bench Verified mini |
---|---|---|---|
GLM-4-32B-0414 | Moatless[1] | 33.8 | 38.0 |
GLM-4-32B-0414 | Agentless[2] | 30.7 | 34.0 |
GLM-4-32B-0414 | OpenHands[3] | 27.2 | 28.0 |
[1] Moatless v0.0.3使用以下参数:response_format="react", thoughts_in_action=False, max_interations=30
。失败轨迹不重试;其他设置为默认。
[2] Agentless v1.5.0使用BGE作为嵌入模型,使用FAISS进行相似度搜索。为了在保持性能的同时加快补丁验证速度,将单个实例的运行超时时间从默认的300s改为180s。
[3] OpenHands v0.29.1未使用YaRN上下文扩展,但将运行限制在最多60次迭代,并对历史记录进行总结,以防止超过32K上下文限制。总结配置为llm_config="condenser", keep_first=1, max_size=32
。失败轨迹不重试。
🔧 技术细节
文档未提供具体的技术实现细节,故跳过此章节。
📄 许可证
本项目采用MIT许可证。



