开源c4ai-command-r-plus-4bit模型 - 免费支持多语言复杂任务自动化

首页

C4ai Command R Plus 4bit

由 CohereForAI 开发

Command R+是Cohere实验室开发的1040亿参数开放权重研究级模型，支持多语言和复杂任务自动化。

大型语言模型

Transformers

支持多种语言#1040亿参数大模型 #多语言RAG优化 #128K长上下文

下载量 707

发布时间 : 4/3/2024

模型简介

Command R+是一个具备检索增强生成(RAG)和工具使用能力的多语言大模型，针对10种语言优化性能，支持128K上下文长度。

模型特点

多语言支持

针对10种主要语言优化性能，支持13种额外语言的理解和生成

长上下文处理

支持128K超长上下文窗口，适合处理复杂文档和长对话

工具调用能力

支持多步骤工具调用，可组合多个工具完成复杂任务

检索增强生成

内置RAG能力，可结合外部知识源生成更准确的回答

模型能力

文本生成

多语言对话

工具调用

多跳推理

长文档处理

摘要生成

问答系统

使用案例

企业应用

客户支持

多语言客户咨询自动回复

提高响应速度和服务质量

文档分析

长文档摘要和关键信息提取

提升文档处理效率

开发者工具

自动化工作流

通过工具调用实现复杂任务自动化

减少手动操作步骤

🚀 Cohere Labs Command R+模型介绍

Cohere Labs Command R+ 是一款拥有 1040 亿参数的模型，具备高度先进的能力，如检索增强生成（RAG）和工具使用，可自动化完成复杂任务。该模型支持多语言，在推理、摘要和问答等多种用例中表现出色。

🚀 快速开始

你可以在下载权重之前，在我们托管的 Hugging Face 空间中试用 Cohere Labs Command R+。

请从包含此模型必要更改的源存储库安装 transformers：

# pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Format message with the command-r-plus chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

✨ 主要特性

模型概述

Cohere Labs Command R+ 是一个 1040 亿参数模型的开放权重研究版本，具有高度先进的能力，包括检索增强生成（RAG）和工具使用，可自动化完成复杂任务。该模型支持多步工具使用，允许模型在多个步骤中组合多个工具以完成困难任务。它是一个多语言模型，在 10 种语言中进行了性能评估：英语、法语、西班牙语、意大利语、德语、巴西葡萄牙语、日语、韩语、阿拉伯语和简体中文。Command R+ 针对推理、摘要和问答等多种用例进行了优化。

Cohere Labs Command R+ 是 Cohere For AI 和 Cohere 开放权重版本系列的一部分。我们较小的配套模型是 Cohere Labs Command R。

模型详情

输入：模型仅接受文本输入。
输出：模型仅生成文本输出。
模型架构：这是一个自回归语言模型，使用了优化的 Transformer 架构。预训练后，该模型使用监督微调（SFT）和偏好训练，使模型行为与人类对有用性和安全性的偏好保持一致。
支持语言：该模型针对以下语言进行了优化：英语、法语、西班牙语、意大利语、德语、巴西葡萄牙语、日语、韩语、简体中文和阿拉伯语。预训练数据还额外包含以下 13 种语言：俄语、波兰语、土耳其语、越南语、荷兰语、捷克语、印尼语、乌克兰语、罗马尼亚语、希腊语、印地语、希伯来语和波斯语。
上下文长度：Command R+ 支持 128K 的上下文长度。

工具使用和多跳能力

Command R+ 经过专门训练，具备对话工具使用能力。这些能力通过监督微调与偏好微调的混合方式，使用特定的提示模板训练到模型中。偏离此提示模板可能会降低性能，但我们鼓励进行实验。

Command R+ 的工具使用功能将对话（可选包含用户 - 系统前言）和可用工具列表作为输入。然后，模型将生成一个 JSON 格式的操作列表，以便在部分工具上执行。Command R+ 可能会多次使用其提供的工具之一。

该模型经过训练，能够识别特殊的 directly_answer 工具，用于表明它不想使用其他任何工具。在某些情况下，如问候用户或询问澄清问题时，不调用特定工具的能力可能会很有用。我们建议包含 directly_answer 工具，但如果需要，也可以将其删除或重命名。

有关使用 Command R+ 工具使用提示模板的详细文档，请参阅此处。

Command R+ 还支持 Hugging Face 的工具使用 API。

以下是一个如何渲染提示的最小工作示例：

用法：渲染工具使用提示 [点击展开]

from transformers import AutoTokenizer

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [
    {"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# Define tools available for the model to use:
tools = [
  {
    "name": "internet_search",
    "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
    "parameter_definitions": {
      "query": {
        "description": "Query to search the internet with",
        "type": 'str',
        "required": True
      }
    }
  },
  {
    'name': "directly_answer",
    "description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
    'parameter_definitions': {}
  }
]

# render the tool use prompt as a string:
tool_use_prompt = tokenizer.apply_tool_use_template(
    conversation,
    tools=tools,
    tokenize=False,
    add_generation_prompt=True,
)
print(tool_use_prompt)

用法：使用工具使用 API 渲染提示 [点击展开]

from transformers import AutoTokenizer

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [
    {"role": "user", "content": "Whats the biggest penguin in the world?"}
]

# Define tools available for the model to use
# Type hints and docstrings from Python functions are automatically extracted
def internet_search(query: str):
    """
    Returns a list of relevant document snippets for a textual query retrieved from the internet

    Args:
        query: Query to search the internet with
    """
    pass

def directly_answer():
    """
    Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
    """
    pass

tools = [internet_search, directly_answer]

# render the tool use prompt as a string:
tool_use_prompt = tokenizer.apply_chat_template(
    conversation,
    tools=tools,
    tokenize=False,
    add_generation_prompt=True,
)
print(tool_use_prompt)

示例渲染的工具使用提示 [点击展开]

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.

# System Preamble
## Basic Rules
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.

# User Preamble
## Task and Context
You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.

## Style Guide
Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.

## Available Tools
Here is a list of tools that you have available to you:

```python
def internet_search(query: str) -> List[Dict]:
    """Returns a list of relevant document snippets for a textual query retrieved from the internet

    Args:
        query (str): Query to search the internet with
    """
    pass
```

```python
def directly_answer() -> List[Dict]:
    """Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
    """
    pass
```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Write 'Action:' followed by a json-formatted list of actions that you want to perform in order to produce a good response to the user's last input. You can use any of the supplied tools any number of times, but you should aim to execute the minimum number of necessary actions for the input. You should use the `directly-answer` tool if calling the other tools is unnecessary. The list of actions you want to call should be formatted as a list of json objects, for example:
```json
[
    {
        "tool_name": title of the tool in the specification,
        "parameters": a dict of parameters to input into the tool as they are defined in the specs, or {} if it takes no parameters
    }
]```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

示例渲染的工具使用完成结果 [点击展开]

Action: ```json
[
      {
          "tool_name": "internet_search",
          "parameters": {
              "query": "biggest penguin in the world"
          }
      }
]
```

基于文档的生成和 RAG 能力

Command R+ 经过专门训练，具备基于文档的生成能力。这意味着它可以根据提供的文档片段列表生成响应，并在响应中包含引用范围（引用），以指示信息的来源。这可用于实现基于文档的摘要和检索增强生成（RAG）的最后一步等行为。这种行为通过监督微调与偏好微调的混合方式，使用特定的提示模板训练到模型中。偏离此提示模板可能会降低性能，但我们鼓励进行实验。

Command R+ 的基于文档的生成行为将对话（可选包含用户提供的系统前言，指示任务、上下文和所需的输出风格）和检索到的文档片段列表作为输入。文档片段应该是小块，而不是长文档，通常每个块约 100 - 400 个单词。文档片段由键值对组成。键应该是简短的描述性字符串，值可以是文本或半结构化数据。

默认情况下，Command R+ 将通过以下步骤生成基于文档的响应：首先预测哪些文档相关，然后预测将引用哪些文档，接着生成答案，最后将引用范围插入答案中。请参阅以下示例，这称为 accurate 基于文档的生成。

该模型还使用了多种其他回答模式进行训练，可以通过更改提示进行选择。分词器支持 fast 引用模式，该模式将直接生成包含引用范围的答案，而无需先完整写出答案。这是以牺牲一些引用准确性为代价，以减少生成的标记数量。

有关使用 Command R+ 的基于文档的生成提示模板的详细文档，请参阅此处。

以下是一个如何渲染提示的最小工作示例：

用法：渲染基于文档的生成提示 [点击展开]

from transformers import AutoTokenizer

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [
    {"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# define documents to ground on:
documents = [
    { "title": "Tall penguins", "text": "Emperor penguins are the tallest growing up to 122 cm in height." }, 
    { "title": "Penguin habitats", "text": "Emperor penguins only live in Antarctica."}
]

# render the tool use prompt as a string:
grounded_generation_prompt = tokenizer.apply_grounded_generation_template(
    conversation,
    documents=documents,
    citation_mode="accurate", # or "fast"
    tokenize=False,
    add_generation_prompt=True,
)
print(grounded_generation_prompt)

示例渲染的基于文档的生成提示 [点击展开]

The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.

# System Preamble
## Basic Rules
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.

# User Preamble
## Task and Context
You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.

## Style Guide
Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|><results>
Document: 0
title: Tall penguins
text: Emperor penguins are the tallest growing up to 122 cm in height.

Document: 1
title: Penguin habitats
text: Emperor penguins only live in Antarctica.
</results><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Carefully perform the following instructions, in order, starting each with a new line.
Firstly, Decide which of the retrieved documents are relevant to the user's last input by writing 'Relevant Documents:' followed by comma-separated list of document numbers. If none are relevant, you should instead write 'None'.
Secondly, Decide which of the retrieved documents contain facts that should be cited in a good answer to the user's last input by writing 'Cited Documents:' followed a comma-separated list of document numbers. If you dont want to cite any of them, you should instead write 'None'.
Thirdly, Write 'Answer:' followed by a response to the user's last input in high quality natural english. Use the retrieved documents to help you. Do not insert any citations or grounding markup.
Finally, Write 'Grounded answer:' followed by a response to the user's last input in high quality natural english. Use the symbols <co: doc> and </co: doc> to indicate when a fact comes from a document in the search result, e.g <co: 0>my fact</co: 0> for a fact from document 0.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

示例渲染的基于文档的生成完成结果 [点击展开]

Relevant Documents: 0,1
Cited Documents: 0,1
Answer: The Emperor Penguin is the tallest or biggest penguin in the world. It is a bird that lives only in Antarctica and grows to a height of around 122 centimetres.
Grounded answer: The <co: 0>Emperor Penguin</co: 0> is the <co: 0>tallest</co: 0> or biggest penguin in the world. It is a bird that <co: 1>lives only in Antarctica</co: 1> and <co: 0>grows to a height of around 122 centimetres.</co: 0>

代码能力

Command R+ 经过优化，可以通过请求代码片段、代码解释或代码重写与你的代码进行交互。对于纯代码补全，它可能无法直接表现出色。为了获得更好的性能，我们还建议在与代码生成相关的指令中使用较低的温度（甚至贪婪解码）。

📚 详细文档

模型信息

开发者：Cohere 和 Cohere For AI
联系方式：Cohere For AI: cohere.for.ai
许可证：CC - BY - NC，还需要遵守 Cohere Lab's Acceptable Use Policy
模型名称：c4ai - command - r - plus
模型大小：1040 亿参数
上下文长度：128K

试用模型

你可以在我们托管的 Hugging Face 空间中试用 Cohere Labs Command R+，然后再下载权重。你也可以在此处的游乐场中试用 Command R+ 聊天功能。

📄 许可证

本模型受 CC - BY - NC 许可证约束，还需要遵守 Cohere Lab's Acceptable Use Policy。

⚠️ 重要提示

通过提交此表单，即表示你同意许可协议，并确认你提供的信息将根据 Cohere 的隐私政策进行收集、使用和共享。你将收到有关 Cohere Labs 和 Cohere 研究、活动、产品和服务的电子邮件更新。你可以随时取消订阅。

属性	详情
支持语言	英语、法语、西班牙语、意大利语、德语、巴西葡萄牙语、日语、韩语、阿拉伯语、简体中文。预训练数据还额外包含俄语、波兰语、土耳其语、越南语、荷兰语、捷克语、印尼语、乌克兰语、罗马尼亚语、希腊语、印地语、希伯来语和波斯语。
许可证	CC - BY - NC，需遵守 Cohere Lab's Acceptable Use Policy
模型类型	自回归语言模型，使用优化的 Transformer 架构
模型大小	1040 亿参数
上下文长度	128K