c4ai-command-r-plus-4bit开源大语言模型 - 多语言支持长文互动与高级功能应用

首页

C4ai Command R Plus 4bit

由 CohereLabs 开发

Cohere Labs Command R+ 是一款1040亿参数的多语言大语言模型，具备检索增强生成(RAG)和工具使用等高级功能，支持128K上下文长度。

大型语言模型

Transformers

支持多种语言#多工具协同推理 #多语言RAG生成 #128K长文本处理

下载量 316

发布时间 : 4/3/2024

模型简介

Command R+ 是Cohere开发的开放权重研究版本模型，专注于复杂任务处理，如推理、总结和问答，支持10种主要语言。

模型特点

多跳工具使用

支持多步骤工具调用组合，可生成JSON格式动作列表执行复杂任务流程

检索增强生成(RAG)

能基于提供的文档片段生成带事实引用的响应，支持accurate和fast两种引用模式

超长上下文

支持128K tokens的上下文窗口，适合处理长文档和复杂对话场景

多语言优化

专门针对10种主要语言优化性能，额外支持13种语言的预训练

模型能力

多语言文本生成

复杂任务自动化

事实性问答

多文档总结

工具调用集成

长上下文理解

使用案例

知识问答

事实核查

基于最新文档提供带引用的准确回答

生成包含[1][2]等事实来源标记的答案

企业自动化

工作流自动化

通过组合调用API工具完成多步骤业务流程

自动生成包含参数的工具调用JSON

内容处理

长文档分析

处理长达128K token的文档进行关键信息提取

生成带章节引用的总结报告

🚀 Cohere Labs Command R+ 模型卡片

Cohere Labs Command R+ 是一款拥有强大能力的模型，具备检索增强生成（RAG）和工具使用等高级功能，可处理复杂任务。它支持多语言，在推理、总结和问答等多种场景下表现出色。

🚀 快速开始

你可以在下载权重之前，在我们托管的 Hugging Face 空间中试用 Cohere Labs Command R+。

请从包含此模型必要更改的源仓库安装 transformers：

# pip install 'git+https://github.com/huggingface/transformers.git' bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Format message with the command-r-plus chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

✨ 主要特性

模型概述

Cohere Labs Command R+ 是一个拥有 1040 亿参数的模型的开放权重研究版本，具备高度先进的能力，包括检索增强生成（RAG）和工具使用，以自动化复杂任务。该模型的工具使用功能支持多步骤工具使用，允许模型在多个步骤中组合使用多个工具来完成困难任务。Cohere Labs Command R+ 是一个多语言模型，在 10 种语言中进行了性能评估：英语、法语、西班牙语、意大利语、德语、巴西葡萄牙语、日语、韩语、阿拉伯语和简体中文。Command R+ 针对各种用例进行了优化，包括推理、总结和问答。

Cohere Labs Command R+ 是 Cohere For AI 和 Cohere 开放权重版本系列的一部分。我们较小的配套模型是 Cohere Labs Command R。

开发者：Cohere 和 Cohere For AI
联系方式：Cohere For AI: cohere.for.ai
许可证：CC - BY - NC，还需遵守 Cohere Lab's 可接受使用政策
模型：c4ai - command - r - plus
模型大小：1040 亿参数
上下文长度：128K

模型详情

输入：模型仅接受文本输入。
输出：模型仅生成文本输出。
模型架构：这是一个自回归语言模型，使用了优化的 Transformer 架构。预训练后，该模型使用监督微调（SFT）和偏好训练，使模型行为符合人类对有用性和安全性的偏好。
支持语言：该模型针对以下语言进行了优化：英语、法语、西班牙语、意大利语、德语、巴西葡萄牙语、日语、韩语、简体中文和阿拉伯语。预训练数据还额外包含以下 13 种语言：俄语、波兰语、土耳其语、越南语、荷兰语、捷克语、印尼语、乌克兰语、罗马尼亚语、希腊语、印地语、希伯来语、波斯语。
上下文长度：Command R+ 支持 128K 的上下文长度。

工具使用和多跳能力

Command R+ 经过专门训练，具备对话工具使用能力。这些能力通过监督微调与偏好微调相结合的方式，使用特定的提示模板训练到模型中。偏离此提示模板可能会降低性能，但我们鼓励进行实验。

Command R+ 的工具使用功能将对话（可选包含用户 - 系统前言）和可用工具列表作为输入。然后，模型将生成一个 JSON 格式的动作列表，用于在部分工具上执行。Command R+ 可能会多次使用提供的工具之一。

该模型经过训练，能够识别特殊的 directly_answer 工具，用于表明它不想使用其他工具。在某些情况下，如问候用户或询问澄清问题时，不调用特定工具的能力可能会很有用。我们建议包含 directly_answer 工具，但如有需要，也可以将其移除或重命名。

有关使用 Command R+ 工具使用提示模板的详细文档，请参阅此处。

Command R+ 还支持 Hugging Face 的工具使用 API。

以下代码片段展示了如何渲染提示的最小工作示例：

使用方法：渲染工具使用提示 [点击展开]

from transformers import AutoTokenizer

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [
    {"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# Define tools available for the model to use:
tools = [
  {
    "name": "internet_search",
    "description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
    "parameter_definitions": {
      "query": {
        "description": "Query to search the internet with",
        "type": 'str',
        "required": True
      }
    }
  },
  {
    'name': "directly_answer",
    "description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
    'parameter_definitions': {}
  }
]

# render the tool use prompt as a string:
tool_use_prompt = tokenizer.apply_tool_use_template(
    conversation,
    tools=tools,
    tokenize=False,
    add_generation_prompt=True,
)
print(tool_use_prompt)

使用方法：使用工具使用 API 渲染提示 [点击展开]

from transformers import AutoTokenizer

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [
    {"role": "user", "content": "Whats the biggest penguin in the world?"}
]

# Define tools available for the model to use
# Type hints and docstrings from Python functions are automatically extracted
def internet_search(query: str):
    """
    Returns a list of relevant document snippets for a textual query retrieved from the internet

    Args:
        query: Query to search the internet with
    """
    pass

def directly_answer():
    """
    Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
    """
    pass

tools = [internet_search, directly_answer]

# render the tool use prompt as a string:
tool_use_prompt = tokenizer.apply_chat_template(
    conversation,
    tools=tools,
    tokenize=False,
    add_generation_prompt=True,
)
print(tool_use_prompt)

示例渲染的工具使用提示 [点击展开]

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.

# System Preamble
## Basic Rules
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.

# User Preamble
## Task and Context
You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.

## Style Guide
Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.

## Available Tools
Here is a list of tools that you have available to you:

```python
def internet_search(query: str) -> List[Dict]:
    """Returns a list of relevant document snippets for a textual query retrieved from the internet

    Args:
        query (str): Query to search the internet with
    """
    pass

def directly_answer() -> List[Dict]:
    """Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
    """
    pass
```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Write 'Action:' followed by a json-formatted list of actions that you want to perform in order to produce a good response to the user's last input. You can use any of the supplied tools any number of times, but you should aim to execute the minimum number of necessary actions for the input. You should use the `directly-answer` tool if calling the other tools is unnecessary. The list of actions you want to call should be formatted as a list of json objects, for example:
```json
[
    {
        "tool_name": title of the tool in the specification,
        "parameters": a dict of parameters to input into the tool as they are defined in the specs, or {} if it takes no parameters
    }
]```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

示例渲染的工具使用完成结果 [点击展开]

Action: ```json
[
      {
          "tool_name": "internet_search",
          "parameters": {
              "query": "biggest penguin in the world"
          }
      }
]

基于事实的生成和 RAG 能力

Command R+ 经过专门训练，具备基于事实的生成能力。这意味着它可以根据提供的文档片段列表生成响应，并在响应中包含基于事实的范围（引用），指示信息的来源。这可用于实现基于事实的总结和检索增强生成（RAG）的最后一步等行为。这种行为通过监督微调与偏好微调相结合的方式，使用特定的提示模板训练到模型中。偏离此提示模板可能会降低性能，但我们鼓励进行实验。

Command R+ 的基于事实的生成行为将对话（可选包含用户提供的系统前言，指示任务、上下文和所需输出风格）和检索到的文档片段列表作为输入。文档片段应该是小块，而不是长文档，通常每个块约 100 - 400 个单词。文档片段由键值对组成。键应该是简短的描述性字符串，值可以是文本或半结构化的。

默认情况下，Command R+ 将通过首先预测哪些文档相关，然后预测将引用哪些文档，接着生成答案，最后在答案中插入基于事实的范围来生成基于事实的响应。以下是一个示例。这被称为 accurate 基于事实的生成。

该模型还经过训练，支持多种其他回答模式，可以通过更改提示来选择。分词器支持 fast 引用模式，该模式将直接生成包含基于事实范围的答案，而无需先完整写出答案。这牺牲了一些基于事实的准确性，以换取生成更少的标记。

有关使用 Command R+ 基于事实的生成提示模板的详细文档，请参阅此处。

以下代码片段展示了如何渲染提示的最小工作示例：

使用方法：渲染基于事实的生成提示 [点击展开]

from transformers import AutoTokenizer

model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [
    {"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# define documents to ground on:
documents = [
    { "title": "Tall penguins", "text": "Emperor penguins are the tallest growing up to 122 cm in height." }, 
    { "title": "Penguin habitats", "text": "Emperor penguins only live in Antarctica."}
]

# render the tool use prompt as a string:
grounded_generation_prompt = tokenizer.apply_grounded_generation_template(
    conversation,
    documents=documents,
    citation_mode="accurate", # or "fast"
    tokenize=False,
    add_generation_prompt=True,
)
print(grounded_generation_prompt)

其他信息

语言支持：该模型支持多种语言，包括英语、法语、德语、西班牙语、意大利语、葡萄牙语、日语、韩语、中文和阿拉伯语。
许可证：CC - BY - NC 4.0。使用此模型时，请提交表单并同意许可协议，并确认您提供的信息将根据 Cohere 的隐私政策进行收集、使用和共享。您将收到有关 Cohere Labs 和 Cohere 研究、活动、产品和服务的电子邮件更新，您可以随时取消订阅。
额外字段：使用模型时，需要提供姓名、所属机构、国家等信息，并且需要确认仅将此模型用于非商业用途。