GLM-4-32B-0414免費AI模型 - 媲美GPT和DeepSeek，支持本地部署

首頁

GLM 4 32B 0414 Unsloth Bnb 4bit

由unsloth開發

GLM-4-32B-0414是GLM家族的新成員，擁有320億參數，性能媲美GPT系列和DeepSeek系列，支持本地部署。

大型語言模型

Transformers

支持多種語言開源協議:MIT #深度推理優化 #多工具協同 #複雜任務處理

下載量 87

發布時間 : 4/25/2025

模型概述

GLM-4-32B-0414是一款高性能大語言模型，支持文本生成、代碼生成、函數調用等多種任務，適用於對話、編程輔助等場景。

模型特點

高性能

性能媲美GPT-4o和DeepSeek-V3-0324，在多項基準測試中表現優異。

本地部署友好

支持本地部署，適合需要私有化部署的場景。

多任務支持

支持文本生成、代碼生成、函數調用等多種任務，適用場景廣泛。

深度推理能力

具備深度思考和反芻能力，適合解決複雜開放性問題。

模型能力

文本生成

代碼生成

函數調用

問答

報告生成

動畫生成

網頁設計

SVG生成

使用案例

編程輔助

Python動畫生成

生成Python代碼展示球在旋轉六邊形內彈跳的動畫。

生成逼真的物理模擬動畫。

HTML模擬

使用HTML模擬小球從旋轉六邊形中心釋放的場景。

實現完全彈性碰撞的交互效果。

設計

網頁UI設計

設計移動機器學習平臺的UI界面。

生成包含訓練任務、存儲管理和統計界面的完整設計。

SVG創作

使用SVG創作煙雨江南的場景。

生成具有藝術感的SVG圖像。

教育

兒童文學分析

分析兒童文學的敘事技巧和主題傾向。

生成詳細的分析報告，涵蓋敘事方法、主題影響等。

🚀 GLM-4-32B-0414

GLM-4-32B-0414系列模型是GLM家族的新成員，擁有320億參數。其性能可與OpenAI的GPT系列以及DeepSeek的V3/R1系列相媲美，還支持非常便捷的本地部署。該模型在多個領域表現出色，為用戶提供強大的文本處理能力。

✨ 主要特性

高性能表現：GLM-4-32B-Base-0414在15T高質量數據上進行預訓練，其中包含大量推理類型的合成數據，為後續的強化學習擴展奠定了基礎。在多個基準測試中，如代碼生成或特定問答任務，GLM-4-32B-Base-0414與GPT-4o和DeepSeek-V3-0324（671B）等更大的模型表現相當。
多模型變體：除了基礎模型，還包括具有深度思維能力的推理模型GLM-Z1-32B-0414、具有沉思能力的深度推理模型GLM-Z1-Rumination-32B-0414，以及在數學推理和通用任務中表現出色的小模型GLM-Z1-9B-0414。
多樣化應用場景：在動畫生成、網頁設計、SVG生成、基於搜索的寫作和函數調用等方面都有良好的表現。

📦 安裝指南

文檔未提及安裝步驟，故跳過此章節。

💻 使用示例

基礎用法

在基於搜索的寫作任務中，使用以下系統提示讓模型根據搜索結果進行回覆：

請根據所給搜索返回結果對用戶問題進行作答。

## 注意
1. 充分利用和整理收集到的信息，而不是簡單的複製粘貼，生成符合用戶要求且有深度的專業答案。
2. 所提供信息充分的情況下，你的回答需儘可能延長，從用戶意圖角度出發，提供具有足夠信息量和多角度的回覆。
3. 另外，並非所有的搜索結果都與用戶問題密切相關，請仔細的甄別、篩選和利用。
4. 客觀類問答的答案通常非常簡短，你可以適當補充一到兩句相關信息，以豐富內容。
5. 請確保你的回覆格式美觀、可讀性強。對於多實體對比或列舉，善用列表格式來幫助用戶更好的理解信息。
6. 除非用戶要求，否則你回答的語言請於用戶提問語言保持一致。
7. 在適當情況下在句子末尾使用例如:【0†source】的格式引用搜索結果。

使用時，可以通過RAG或WebSearch等方法獲取搜索結果，並將其包裝在observation中，例如：

[
    {
        "role": "user",
        "content": "Explore the common characteristics of children's literature, with a focus on its narrative techniques and thematic tendencies. This includes narrative techniques: common approaches in children's literature such as first-person, third-person, omniscient narrator, and interactive narration, and their influence on young readers. It also includes thematic tendencies: recurring themes in children's literature such as growth, adventure, friendship, and family, with an analysis of how these themes impact children's cognitive and emotional development. Additionally, other universal features such as the use of personification, repetitive language, symbolism and metaphor, and educational value should be considered. Please provide a detailed analytical report based on academic research, classic examples of children's literature, and expert opinions."
    },
    {
        "role": "observation",
        "content": "【{id}†{title}†{url}】\n{content}"
    },
    ...
]

高級用法

GLM-4-32B-0414支持以JSON格式調用外部工具，以下是使用HuggingFace Transformers實現工具調用和最終回覆生成的示例代碼：

import json
import re
import ast
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "THUDM/GLM-4-32B-0414"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")

def is_function_call(single_message):
    """Determine whether the current system message is a function call."""
    pattern = re.compile(r'([^\n`]*?)\n({.*?})(?=\w*\n|$)', re.DOTALL)
    matches = pattern.findall(single_message)
    if not matches:
        return False

    func_name, args_str = matches[0]
    func_name = func_name.strip()
    try:
        parsed_args = json.loads(args_str)
    except json.JSONDecodeError:
        try:
            parsed_args = ast.literal_eval(args_str)
        except:
            return False
    
    return {"name": func_name, "arguments": parsed_args}

def realtime_aqi(city):
    """Weather Query Tool"""
    if '北京' in city.lower():
        return json.dumps({'city': '北京', 'aqi': '10', 'unit': 'celsius'}, ensure_ascii=False)
    elif '上海' in city.lower():
        return json.dumps({'city': '上海', 'aqi': '72', 'unit': 'fahrenheit'}, ensure_ascii=False)
    else:
        return json.dumps({'city': city, 'aqi': 'unknown'}, ensure_ascii=False)

def build_system_prompt(tools):
    """Construct system prompt based on the list of available tools."""
    if tools is None:
        tools = []
    value = "# 可用工具"
    contents = []
    for tool in tools:
        content = f"\n\n## {tool['function']['name']}\n\n{json.dumps(tool['function'], ensure_ascii=False, indent=4)}"
        content += "\n在調用上述函數時，請使用 Json 格式表示調用的參數。"
        contents.append(content)
    value += "".join(contents)
    return value

tools = [
  {
    "type": "function", 
    "function": {
      "name": "realtime_aqi",
      "description": "天氣預報。獲取即時空氣質量。當前空氣質量，PM2.5，PM10信息",
      "parameters": {
          "type": "object",
          "properties": {
              "city": {
                  "description": "城市名"
              }
          },
          "required": [
              "city"
          ]
      }
	}
  }
]

system_prompt = build_system_prompt(tools)

message = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "北京和上海今天的天氣情況"}
]
print(f"User Message: {message[-1]['content']}")

while True:
    inputs = tokenizer.apply_chat_template(
        message,
        return_tensors="pt",
        add_generation_prompt=True,
        return_dict=True,
    ).to(model.device)

    generate_kwargs = {
        "input_ids": inputs["input_ids"],
        "attention_mask": inputs["attention_mask"],
        "max_new_tokens": 1024,
        "do_sample": True,
    }
    out = model.generate(**generate_kwargs)
    generate_resp = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:-1], skip_special_tokens=False)
    stop_sequence = tokenizer.decode(out[0][-1:], skip_speical_tokens=False)
    if stop_sequence == "<|user|>":
        print(f"Assistant Response: {generate_resp.strip()}")
        break

    function_calls = []
    for m in generate_resp.split("<|assistant|>"):
        fc_decode = is_function_call(m.strip())
        if fc_decode:
            message.append({"role": "assistant", "metadata": fc_decode['name'], "content": json.dumps(fc_decode['arguments'], ensure_ascii=False)})
            print(f"Function Call: {fc_decode}")
            function_calls.append(fc_decode)
        else:
            message.append({"role": "assistant", "content": m})
            print(f"Assistant Response: {m.strip()}")
    
    for fc in function_calls:
        function_response = realtime_aqi(
            city=fc["arguments"]["city"],
        )
        print(f"Function Response: {function_response}")
        message.append({"role": "observation", "content": function_response})

📚 詳細文檔

模型展示

動畫生成

模型	示例視頻	提示內容
GLM-Z1-32B-0414		write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically
GLM-4-32B-0414		Use HTML to simulate the scenario of a small ball released from the center of a rotating hexagon. Consider the collision between the ball and the hexagon's edges, the gravity acting on the ball, and assume all collisions are perfectly elastic. (Prompt translated from Chinese)

網頁設計

模型	示例圖片	提示內容
GLM-4-32B-0414		Design a drawing board that supports custom function plotting, allowing adding and deleting custom functions, and assigning colors to functions. (Prompt translated from Chinese)
GLM-4-32B-0414		Design a UI for a mobile machine learning platform, which should include interfaces for training tasks, storage management, and personal statistics. The personal statistics interface should use charts to display the user's resource usage over a period. Use Tailwind CSS to style the page, and display these 3 mobile interfaces tiled on a single HTML page. (Prompt translated from Chinese)

SVG生成

模型	示例圖片	提示內容
GLM-4-32B-0414		Create a misty Jiangnan scene using SVG. (Prompt translated from Chinese)
GLM-4-32B-0414		Use SVG to illustrate the training process of an LLM. (Prompt translated from Chinese)

基於搜索的寫作

對於基於搜索的寫作任務，使用特定的系統提示讓模型根據搜索結果進行回覆，並給出了詳細的注意事項和使用示例。

評估結果

GLM-4-0414系列模型評估

模型	IFEval	BFCL-v3 (Overall)	BFCL-v3 (MultiTurn)	TAU-Bench (Retail)	TAU-Bench (Airline)	SimpleQA	HotpotQA
Qwen2.5-Max	85.6	50.9	30.5	58.3	22.0	79.0	52.8
GPT-4o-1120	81.9	69.6	41.0	62.8	46.0	82.8	63.9
DeepSeek-V3-0324	83.4	66.2	35.8	60.7	32.4	82.6	54.6
DeepSeek-R1	84.3	57.5	12.4	33.0	37.3	83.9	63.1
GLM-4-32B-0414	87.6	69.6	41.5	68.7	51.2	88.1	63.8

對於SimpleQA和HotpotQA，從每個測試集中抽取近500個測試用例，為所有模型提供基本的search和click工具，確保其他設置一致，並進行3次運行取平均值。

不同框架下的評估

模型	框架	SWE-bench Verified	SWE-bench Verified mini
GLM-4-32B-0414	Moatless^[1]	33.8	38.0
GLM-4-32B-0414	Agentless^[2]	30.7	34.0
GLM-4-32B-0414	OpenHands^[3]	27.2	28.0

[1] Moatless v0.0.3使用以下參數：response_format="react", thoughts_in_action=False, max_interations=30。失敗軌跡不重試；其他設置為默認。

[2] Agentless v1.5.0使用BGE作為嵌入模型，使用FAISS進行相似度搜索。為了在保持性能的同時加快補丁驗證速度，將單個實例的運行超時時間從默認的300s改為180s。

[3] OpenHands v0.29.1未使用YaRN上下文擴展，但將運行限制在最多60次迭代，並對歷史記錄進行總結，以防止超過32K上下文限制。總結配置為llm_config="condenser", keep_first=1, max_size=32。失敗軌跡不重試。

🔧 技術細節

文檔未提供具體的技術實現細節，故跳過此章節。

📄 許可證

本項目採用MIT許可證。

精選推薦AI模型

Llama 3 Typhoon V1.5x 8b Instruct

專為泰語設計的80億參數指令模型，性能媲美GPT-3.5-turbo，優化了應用場景、檢索增強生成、受限生成和推理任務

Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型，專為邊緣設備推理設計，體積僅為Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基於RoBERTa架構的中文抽取式問答模型，適用於從給定文本中提取答案的任務。

智啟未來，您的人工智能解決方案智庫