🚀 GLM-4-Z1-Rumination-32B-0414
GLM-4-Z1-Rumination-32B-0414是一款强大的文本生成模型,属于GLM系列的新一代开源模型。它具有320亿参数,性能可与OpenAI的GPT系列和DeepSeek的V3/R1系列相媲美,还支持便捷的本地部署。该模型在多个领域表现出色,尤其在深度推理和复杂任务解决方面具有显著优势。
✨ 主要特性
- 高性能表现:GLM-4-32B-0414系列模型在工程代码、Artifact生成、函数调用、搜索式问答和报告生成等领域取得了良好的效果,部分基准测试结果甚至可与GPT-4o和DeepSeek-V3-0324(671B)等更大规模的模型相抗衡。
- 深度思考能力:GLM-Z1-32B-0414是基于GLM-4-32B-0414开发的推理模型,通过冷启动和扩展强化学习,以及在数学、代码和逻辑任务上的进一步训练,显著提升了数学能力和解决复杂任务的能力。
- 反复思考能力:GLM-Z1-Rumination-32B-0414是具有反复思考能力的深度推理模型,在深度思考过程中集成了搜索工具,能够处理更开放和复杂的问题,在研究型写作和复杂检索任务中表现出显著的改进。
- 小模型大能力:GLM-Z1-9B-0414虽然规模较小,但在数学推理和一般任务中仍表现出出色的能力,在同规模的开源模型中处于领先水平,为资源受限的场景提供了高效的选择。
📦 安装指南
使用该模型需要确保 transformers
库的版本大于等于4.51.3:
pip install transformers>=4.51.3
💻 使用示例
基础用法
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = "THUDM/GLM-Z1-Rumination-32B-0414"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
message = [{"role": "user", "content": "Let a, b be positive real numbers such that ab = a + b + 3. Determine the range of possible values for a + b."}]
inputs = tokenizer.apply_chat_template(
message,
return_tensors="pt",
add_generation_prompt=True,
return_dict=True,
).to(model.device)
generate_kwargs = {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"],
"temperature": 0.95,
"top_p": 0.7,
"do_sample": True,
}
out = model.generate(**generate_kwargs)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
高级用法
from transformers import AutoModelForCausalLM, AutoTokenizer
import re
import json
MODEL_PATH = "THUDM/GLM-4-Z1-Rumination-32B-0414"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="auto")
messages = [{"role": "user", "content": "Let a, b be positive real numbers such that ab = a + b + 3. Determine the range of possible values for a + b."}]
generate_kwargs = {
"temperature": 0.95,
"top_p": 0.7,
"do_sample": True,
"max_new_tokens": 16384
}
def get_assistant():
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True,
return_dict=True,
).to(model.device)
out = model.generate(input_ids=inputs["input_ids"], **generate_kwargs)
return tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
def get_observation(function_name, args):
content = None
if function_name == "search":
mock_search_res = [
{"title": "t1", "url":"url1", "snippet": "snippet_content_1"},
{"title": "t2", "url":"url2", "snippet": "snippet_content_2"}
]
content = "\n\n".join([f"【{i}†{res['title']}†{res['url']}\n{res['snippet']}】"] for i, res in enumerate(mock_search_res))
elif function_name == "click":
mock_click_res = "main content"
content = mock_click_res
elif function_name == "open":
mock_open_res = "main_content"
content = mock_open_res
else:
raise ValueError("unspport function name!")
return content
def get_func_name_args(llm_text):
function_call = re.sub(r'.*?</think>', '', llm_text, flags=re.DOTALL)
function_call = json.loads(function_call)
action = function_call['name']
params = function_call['arguments']
return action, params
def pipeline():
end_str = "{\"name\": \"finish\", \"arguments\": {}}"
response = get_assistant()
messages.append({"role": "assistant", "content": response})
max_turns, turns = 35, 1
while not response.endswith(end_str) and turns < max_turns:
action, params = get_func_name_args(response)
observation = get_observation(action, params)
messages.append({"role": "observation", "content": observation})
response = get_assistant()
messages.append({"role": "assistant", "content": response})
turns += 1
if response.endswith(end_str):
final_answer = get_assistant()
else:
final_answer = None
return final_answer
pipeline()
📚 详细文档
函数调用
默认情况下,该模型目前支持以下 function
调用:
search
:使用关键字进行搜索并返回搜索结果
click
:点击搜索结果中的特定网页以查看详细信息
open
:打开固定的URL以查看详细内容
finish
:完成信息收集并开始写作
📄 许可证
本项目采用MIT许可证。