Cogito v1预览版开源模型 - 支持30多种语言，编程通用问题帮你搞定

首页

Cogito V1 Preview Qwen 32B Exl2 4.65bpw

由 async0x42 开发

Cogito v1 预览版是基于Qwen2.5-32B的指令调优生成式模型，支持30多种语言，上下文长度达128k，针对编程、STEM、指令遵循和通用帮助性进行了优化。

大型语言模型

Transformers

开源协议:Apache-2.0 #混合推理模型 #128k长上下文 #多语言编程优化

下载量 27

发布时间 : 4/9/2025

模型简介

Cogito 大语言模型是指令调优的生成式模型（文本输入/文本输出），支持混合推理模式（标准模式和深度思考模式），采用迭代蒸馏与放大（IDA）训练策略，适用于多语言、编程和工具调用场景。

模型特点

混合推理模式

支持标准模式和深度思考模式（自我反思推理），可根据需求切换。

迭代蒸馏与放大（IDA）训练

采用可扩展且高效的对齐策略，通过迭代自我改进实现超智能。

多语言支持

支持30多种语言训练，上下文长度达128k。

工具调用能力

支持单次、并行、多次及并行多次工具调用，适用于标准和深度思考模式。

模型能力

文本生成

指令遵循

编程辅助

STEM问题解答

多语言处理

工具调用

使用案例

编程辅助

脚本生成

生成bash脚本处理特定任务，如矩阵转置。

可生成功能完整的脚本代码。

通用问答

知识问答

回答各类知识性问题，如解释大语言模型。

提供准确且详细的解释。

工具集成

天气查询

通过工具调用获取指定地点的当前温度。

正确调用工具并返回结果。

🚀 Cogito v1预览版 - 32B

Cogito是经过指令调优的生成式大语言模型（文本输入/文本输出），所有模型均在开放许可下发布，可用于商业用途。该模型具有混合推理能力，支持多语言、编码和工具调用，在常见行业基准测试中表现出色。

博客文章

✨ 主要特性

混合推理能力：Cogito模型是混合推理模型，每个模型既可以直接回答问题（标准大语言模型模式），也可以在回答前进行自我反思（类似推理模型）。
先进的训练策略：这些大语言模型使用迭代蒸馏与放大（Iterated Distillation and Amplification，IDA） 进行训练，这是一种可扩展且高效的对齐策略，通过迭代自我改进实现超级智能。
多方面能力优化：模型在编码、STEM、指令遵循和通用实用性方面进行了优化，在多语言、编码和工具调用能力上显著高于同等规模的同类模型。在标准和推理模式下，Cogito v1预览版模型在常见行业基准测试中均优于同等规模的同类模型。
多语言支持与长上下文：每个模型使用30多种语言进行训练，并支持128k的上下文长度。

📚 详细文档

评估

我们将模型在直接模式和推理模式下与同等规模的先进模型进行了比较。在直接模式下，我们与Llama / Qwen的指令模型进行比较；在推理模式下，我们使用Deepseek的R1蒸馏模型 / Qwen的QwQ模型进行对比。

Logo

Livebench全球平均得分：

Logo

如需详细评估，请参考博客文章。

💻 使用示例

基础用法

以下是使用Transformers库调用模型的示例代码：

import transformers
import torch

model_id = "deepcogito/cogito-v1-preview-qwen-32B"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Give me a short introduction to LLMs."},
]

outputs = pipeline(
    messages,
    max_new_tokens=512,
)

print(outputs[0]["generated_text"][-1])

高级用法

启用深度思考

默认情况下，模型将以标准模式回答问题。
若要启用思考模式，可通过以下两种方法之一实现：
- 方法一：添加特定系统提示 若要启用思考模式，只需在系统提示中使用 system_instruction = 'Enable deep thinking subroutine.'。若已有系统提示，则使用 system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction。

import transformers
import torch

model_id = "deepcogito/cogito-v1-preview-qwen-32B"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."

messages = [
    {"role": "system", "content": DEEP_THINKING_INSTRUCTION},
    {"role": "user", "content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."},
]

outputs = pipeline(
    messages,
    max_new_tokens=512,
)

print(outputs[0]["generated_text"][-1])

若已有系统提示，可按以下方式将 DEEP_THINKING_INSTRUCTION 添加到开头：

DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."

system_prompt = "Reply to each prompt with only the actual code - no explanations."
prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."

messages = [
    {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt},
    {"role": "user", "content": prompt}
]

方法二：在分词器中设置 enable_thinking=True 如果你使用Huggingface分词器，可以在应用聊天模板时添加参数 enable_thinking=True。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepcogito/cogito-v1-preview-qwen-32B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Give me a short introduction to LLMs."
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

工具调用

Cogito模型在标准和扩展思考模式下均支持工具调用（单工具、并行、多工具和并行多工具）。

# First, define a tool
def get_current_temperature(location: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    Returns:
        The current temperature at the specified location in the specified units, as a float.
    """
    return 22.  # A real function should probably actually get the temperature!

# Next, create a chat and apply the chat template
messages = [
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]

model_inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)

text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
output_text = tokenizer.batch_decode(outputs)[0][len(text):]
print(output_text)

这将产生以下输出：

<tool_call>
{"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
</tool_call><|im_end|>

你可以像往常一样从这个输入生成文本。如果模型生成了工具调用，你应该将其添加到聊天中：

tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

然后调用工具并将结果以 tool 角色添加到聊天中：

messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

之后，你可以再次调用 generate() 让模型在聊天中使用工具结果：

text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
output_text = tokenizer.batch_decode(outputs)[0][len(text):]

这将产生以下字符串输出：

'The current temperature in Paris is 22.0 degrees.<|im_end|>'

📄 许可证

本仓库和模型权重遵循Apache 2.0许可协议。

📞 联系我们

如果您想联系我们的团队，请发送电子邮件至 contact@deepcogito.com。

精选推荐AI模型

Llama 3 Typhoon V1.5x 8b Instruct

专为泰语设计的80亿参数指令模型，性能媲美GPT-3.5-turbo，优化了应用场景、检索增强生成、受限生成和推理任务

Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型，专为边缘设备推理设计，体积仅为Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基于RoBERTa架构的中文抽取式问答模型，适用于从给定文本中提取答案的任务。

智启未来，您的人工智能解决方案智库