GLM-4-32B-0414免费AI模型 - 媲美GPT和DeepSeek，支持本地部署

首页

GLM 4 32B 0414 Unsloth Bnb 4bit

由 unsloth 开发

GLM-4-32B-0414是GLM家族的新成员，拥有320亿参数，性能媲美GPT系列和DeepSeek系列，支持本地部署。

大型语言模型

Transformers

支持多种语言开源协议:MIT #深度推理优化 #多工具协同 #复杂任务处理

下载量 87

发布时间 : 4/25/2025

模型简介

GLM-4-32B-0414是一款高性能大语言模型，支持文本生成、代码生成、函数调用等多种任务，适用于对话、编程辅助等场景。

模型特点

高性能

性能媲美GPT-4o和DeepSeek-V3-0324，在多项基准测试中表现优异。

本地部署友好

支持本地部署，适合需要私有化部署的场景。

多任务支持

支持文本生成、代码生成、函数调用等多种任务，适用场景广泛。

深度推理能力

具备深度思考和反刍能力，适合解决复杂开放性问题。

模型能力

文本生成

代码生成

函数调用

问答

报告生成

动画生成

网页设计

SVG生成

使用案例

编程辅助

Python动画生成

生成Python代码展示球在旋转六边形内弹跳的动画。

生成逼真的物理模拟动画。

HTML模拟

使用HTML模拟小球从旋转六边形中心释放的场景。

实现完全弹性碰撞的交互效果。

设计

网页UI设计

设计移动机器学习平台的UI界面。

生成包含训练任务、存储管理和统计界面的完整设计。

SVG创作

使用SVG创作烟雨江南的场景。

生成具有艺术感的SVG图像。

教育

儿童文学分析

分析儿童文学的叙事技巧和主题倾向。

生成详细的分析报告，涵盖叙事方法、主题影响等。

🚀 GLM-4-32B-0414

GLM-4-32B-0414系列模型是GLM家族的新成员，拥有320亿参数。其性能可与OpenAI的GPT系列以及DeepSeek的V3/R1系列相媲美，还支持非常便捷的本地部署。该模型在多个领域表现出色，为用户提供强大的文本处理能力。

✨ 主要特性

高性能表现：GLM-4-32B-Base-0414在15T高质量数据上进行预训练，其中包含大量推理类型的合成数据，为后续的强化学习扩展奠定了基础。在多个基准测试中，如代码生成或特定问答任务，GLM-4-32B-Base-0414与GPT-4o和DeepSeek-V3-0324（671B）等更大的模型表现相当。
多模型变体：除了基础模型，还包括具有深度思维能力的推理模型GLM-Z1-32B-0414、具有沉思能力的深度推理模型GLM-Z1-Rumination-32B-0414，以及在数学推理和通用任务中表现出色的小模型GLM-Z1-9B-0414。
多样化应用场景：在动画生成、网页设计、SVG生成、基于搜索的写作和函数调用等方面都有良好的表现。

📦 安装指南

文档未提及安装步骤，故跳过此章节。

💻 使用示例

基础用法

在基于搜索的写作任务中，使用以下系统提示让模型根据搜索结果进行回复：

请根据所给搜索返回结果对用户问题进行作答。

## 注意
1. 充分利用和整理收集到的信息，而不是简单的复制粘贴，生成符合用户要求且有深度的专业答案。
2. 所提供信息充分的情况下，你的回答需尽可能延长，从用户意图角度出发，提供具有足够信息量和多角度的回复。
3. 另外，并非所有的搜索结果都与用户问题密切相关，请仔细的甄别、筛选和利用。
4. 客观类问答的答案通常非常简短，你可以适当补充一到两句相关信息，以丰富内容。
5. 请确保你的回复格式美观、可读性强。对于多实体对比或列举，善用列表格式来帮助用户更好的理解信息。
6. 除非用户要求，否则你回答的语言请于用户提问语言保持一致。
7. 在适当情况下在句子末尾使用例如:【0†source】的格式引用搜索结果。

使用时，可以通过RAG或WebSearch等方法获取搜索结果，并将其包装在observation中，例如：

[
    {
        "role": "user",
        "content": "Explore the common characteristics of children's literature, with a focus on its narrative techniques and thematic tendencies. This includes narrative techniques: common approaches in children's literature such as first-person, third-person, omniscient narrator, and interactive narration, and their influence on young readers. It also includes thematic tendencies: recurring themes in children's literature such as growth, adventure, friendship, and family, with an analysis of how these themes impact children's cognitive and emotional development. Additionally, other universal features such as the use of personification, repetitive language, symbolism and metaphor, and educational value should be considered. Please provide a detailed analytical report based on academic research, classic examples of children's literature, and expert opinions."
    },
    {
        "role": "observation",
        "content": "【{id}†{title}†{url}】\n{content}"
    },
    ...
]

高级用法

GLM-4-32B-0414支持以JSON格式调用外部工具，以下是使用HuggingFace Transformers实现工具调用和最终回复生成的示例代码：

import json
import re
import ast
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "THUDM/GLM-4-32B-0414"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")

def is_function_call(single_message):
    """Determine whether the current system message is a function call."""
    pattern = re.compile(r'([^\n`]*?)\n({.*?})(?=\w*\n|$)', re.DOTALL)
    matches = pattern.findall(single_message)
    if not matches:
        return False

    func_name, args_str = matches[0]
    func_name = func_name.strip()
    try:
        parsed_args = json.loads(args_str)
    except json.JSONDecodeError:
        try:
            parsed_args = ast.literal_eval(args_str)
        except:
            return False
    
    return {"name": func_name, "arguments": parsed_args}

def realtime_aqi(city):
    """Weather Query Tool"""
    if '北京' in city.lower():
        return json.dumps({'city': '北京', 'aqi': '10', 'unit': 'celsius'}, ensure_ascii=False)
    elif '上海' in city.lower():
        return json.dumps({'city': '上海', 'aqi': '72', 'unit': 'fahrenheit'}, ensure_ascii=False)
    else:
        return json.dumps({'city': city, 'aqi': 'unknown'}, ensure_ascii=False)

def build_system_prompt(tools):
    """Construct system prompt based on the list of available tools."""
    if tools is None:
        tools = []
    value = "# 可用工具"
    contents = []
    for tool in tools:
        content = f"\n\n## {tool['function']['name']}\n\n{json.dumps(tool['function'], ensure_ascii=False, indent=4)}"
        content += "\n在调用上述函数时，请使用 Json 格式表示调用的参数。"
        contents.append(content)
    value += "".join(contents)
    return value

tools = [
  {
    "type": "function", 
    "function": {
      "name": "realtime_aqi",
      "description": "天气预报。获取实时空气质量。当前空气质量，PM2.5，PM10信息",
      "parameters": {
          "type": "object",
          "properties": {
              "city": {
                  "description": "城市名"
              }
          },
          "required": [
              "city"
          ]
      }
	}
  }
]

system_prompt = build_system_prompt(tools)

message = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "北京和上海今天的天气情况"}
]
print(f"User Message: {message[-1]['content']}")

while True:
    inputs = tokenizer.apply_chat_template(
        message,
        return_tensors="pt",
        add_generation_prompt=True,
        return_dict=True,
    ).to(model.device)

    generate_kwargs = {
        "input_ids": inputs["input_ids"],
        "attention_mask": inputs["attention_mask"],
        "max_new_tokens": 1024,
        "do_sample": True,
    }
    out = model.generate(**generate_kwargs)
    generate_resp = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:-1], skip_special_tokens=False)
    stop_sequence = tokenizer.decode(out[0][-1:], skip_speical_tokens=False)
    if stop_sequence == "<|user|>":
        print(f"Assistant Response: {generate_resp.strip()}")
        break

    function_calls = []
    for m in generate_resp.split("<|assistant|>"):
        fc_decode = is_function_call(m.strip())
        if fc_decode:
            message.append({"role": "assistant", "metadata": fc_decode['name'], "content": json.dumps(fc_decode['arguments'], ensure_ascii=False)})
            print(f"Function Call: {fc_decode}")
            function_calls.append(fc_decode)
        else:
            message.append({"role": "assistant", "content": m})
            print(f"Assistant Response: {m.strip()}")
    
    for fc in function_calls:
        function_response = realtime_aqi(
            city=fc["arguments"]["city"],
        )
        print(f"Function Response: {function_response}")
        message.append({"role": "observation", "content": function_response})

📚 详细文档

模型展示

动画生成

模型	示例视频	提示内容
GLM-Z1-32B-0414		write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically
GLM-4-32B-0414		Use HTML to simulate the scenario of a small ball released from the center of a rotating hexagon. Consider the collision between the ball and the hexagon's edges, the gravity acting on the ball, and assume all collisions are perfectly elastic. (Prompt translated from Chinese)

网页设计

模型	示例图片	提示内容
GLM-4-32B-0414		Design a drawing board that supports custom function plotting, allowing adding and deleting custom functions, and assigning colors to functions. (Prompt translated from Chinese)
GLM-4-32B-0414		Design a UI for a mobile machine learning platform, which should include interfaces for training tasks, storage management, and personal statistics. The personal statistics interface should use charts to display the user's resource usage over a period. Use Tailwind CSS to style the page, and display these 3 mobile interfaces tiled on a single HTML page. (Prompt translated from Chinese)

SVG生成

模型	示例图片	提示内容
GLM-4-32B-0414		Create a misty Jiangnan scene using SVG. (Prompt translated from Chinese)
GLM-4-32B-0414		Use SVG to illustrate the training process of an LLM. (Prompt translated from Chinese)

基于搜索的写作

对于基于搜索的写作任务，使用特定的系统提示让模型根据搜索结果进行回复，并给出了详细的注意事项和使用示例。

评估结果

GLM-4-0414系列模型评估

模型	IFEval	BFCL-v3 (Overall)	BFCL-v3 (MultiTurn)	TAU-Bench (Retail)	TAU-Bench (Airline)	SimpleQA	HotpotQA
Qwen2.5-Max	85.6	50.9	30.5	58.3	22.0	79.0	52.8
GPT-4o-1120	81.9	69.6	41.0	62.8	46.0	82.8	63.9
DeepSeek-V3-0324	83.4	66.2	35.8	60.7	32.4	82.6	54.6
DeepSeek-R1	84.3	57.5	12.4	33.0	37.3	83.9	63.1
GLM-4-32B-0414	87.6	69.6	41.5	68.7	51.2	88.1	63.8

对于SimpleQA和HotpotQA，从每个测试集中抽取近500个测试用例，为所有模型提供基本的search和click工具，确保其他设置一致，并进行3次运行取平均值。

不同框架下的评估

模型	框架	SWE-bench Verified	SWE-bench Verified mini
GLM-4-32B-0414	Moatless^[1]	33.8	38.0
GLM-4-32B-0414	Agentless^[2]	30.7	34.0
GLM-4-32B-0414	OpenHands^[3]	27.2	28.0

[1] Moatless v0.0.3使用以下参数：response_format="react", thoughts_in_action=False, max_interations=30。失败轨迹不重试；其他设置为默认。

[2] Agentless v1.5.0使用BGE作为嵌入模型，使用FAISS进行相似度搜索。为了在保持性能的同时加快补丁验证速度，将单个实例的运行超时时间从默认的300s改为180s。

[3] OpenHands v0.29.1未使用YaRN上下文扩展，但将运行限制在最多60次迭代，并对历史记录进行总结，以防止超过32K上下文限制。总结配置为llm_config="condenser", keep_first=1, max_size=32。失败轨迹不重试。

🔧 技术细节

文档未提供具体的技术实现细节，故跳过此章节。

📄 许可证

本项目采用MIT许可证。

精选推荐AI模型

Llama 3 Typhoon V1.5x 8b Instruct

专为泰语设计的80亿参数指令模型，性能媲美GPT-3.5-turbo，优化了应用场景、检索增强生成、受限生成和推理任务

Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型，专为边缘设备推理设计，体积仅为Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基于RoBERTa架构的中文抽取式问答模型，适用于从给定文本中提取答案的任务。

智启未来，您的人工智能解决方案智库