Open-source high-performance generative model c4ai-command-r-v01 - Supports multilingual inference, summarization, Q&A, and powerful RAG capabilities

C4ai Command R V01

Developed by CohereLabs

Command-R is a high-performance 35 billion parameter generative model research version, optimized for scenarios such as reasoning, summarization, and question answering, supporting generation capabilities in 10 languages with exceptional Retrieval-Augmented Generation (RAG) performance.

Large Language Model

Transformers

Supports Multiple Languages#35 billion parameter large model #Multilingual RAG optimization #128K long context

Downloads 9,243

Release Time : 3/11/2024

Model Overview

Cohere Labs Command-R is an open-weight large language model featuring an optimized autoregressive transformer architecture, fine-tuned with supervised learning and preference training to align with human values, supporting multilingual generation and complex task handling.

Model Features

Multilingual support

Supports generation capabilities in 10 core optimized languages, including English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic.

Retrieval-Augmented Generation (RAG)

Features exceptional Retrieval-Augmented Generation performance, supporting the generation of answers with references based on document fragments, offering both precise and fast citation modes.

Tool calling capability

Supports single-step and multi-step tool calling, suitable for API/database/search engine interactions, as well as complex task planning scenarios.

Long context processing

Supports context lengths of up to 128K, ideal for handling long documents and complex dialogue scenarios.

Model Capabilities

Text generation

Reasoning

Summarization

Question answering

Retrieval-Augmented Generation (RAG)

Tool calling

Code interaction

Use Cases

Content generation

Multilingual content creation

Generate marketing copy, articles, or social media content in multiple languages.

High-quality, linguistically appropriate text output

Information retrieval & Q&A

Document-based Q&A system

Generate accurate answers with citations based on provided document fragments.

Improves accuracy and credibility of information retrieval

Developer tools

Code generation & explanation

Generate, explain, or refactor code snippets.

Enhances development efficiency and aids in learning programming

🚀 Cohere Labs Command-R

Cohere Labs Command-R is a high - performance generative model with 35 billion parameters. It's optimized for various tasks like reasoning, summarization, and question - answering, and supports multilingual generation and powerful RAG capabilities.

🚀 Quick Start

Try Cohere Labs Command R

If you want to try Command R before downloading the weights, the model is hosted in a hugging face space here.

Usage

Please use transformers version 4.39.1 or higher

# pip install 'transformers>=4.39.1'
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereLabs/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Format message with the command-r chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

Quantized model through bitsandbytes, 8 - bit precision

# pip install 'transformers>=4.39.1' bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model_id = "CohereLabs/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)

# Format message with the command-r chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

Quantized model through bitsandbytes, 4 - bit precision

You can find a quantized version of this model to 4 - bit precision here.

✨ Features

Multilingual Generation: Evaluated in 10 languages and supports generation in multiple languages.
RAG Capabilities: Highly performant Retrieval Augmented Generation capabilities.
Grounded Generation: Can generate responses based on supplied document snippets and include citations.

📚 Documentation

Model Summary

Cohere Labs Command - R is a research release of a 35 billion parameter highly performant generative model. It's a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering.

Developed by: Cohere and Cohere Labs

Point of Contact: Cohere Labs
License: CC - BY - NC, requires also adhering to Cohere Lab's Acceptable Use Policy
Model: c4ai - command - r - v01
Model Size: 35 billion parameters
Context length: 128K

Model Details

Input: Models input text only.
Output: Models generate text only.
Model Architecture: This is an auto - regressive language model that uses an optimized transformer architecture. After pretraining, this model uses supervised fine - tuning (SFT) and preference training to align model behavior to human preferences for helpfulness and safety.
Languages covered: The model is optimized to perform well in the following languages: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. Pre - training data additionally included the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, Persian.
Context length: Command - R supports a context length of 128K.

Grounded Generation and RAG Capabilities

Command - R has been specifically trained with grounded generation capabilities. It can generate responses based on a list of supplied document snippets and include grounding spans (citations) in its response indicating the source of the information.

This can be used to enable behaviors such as grounded summarization and the final step of Retrieval Augmented Generation (RAG). This behavior has been trained into the model via a mixture of supervised fine - tuning and preference fine - tuning, using a specific prompt template. Deviating from this prompt template may reduce performance, but experimentation is encouraged.

Command - R’s grounded generation behavior takes a conversation as input (with an optional user - supplied system preamble, indicating task, context and desired output style), along with a list of retrieved document snippets. The document snippets should be chunks, rather than long documents, typically around 100 - 400 words per chunk. Document snippets consist of key - value pairs. The keys should be short descriptive strings, the values can be text or semi - structured.

By default, Command - R will generate grounded responses by first predicting which documents are relevant, then predicting which ones it will cite, then generating an answer. Finally, it will then insert grounding spans into the answer. This is referred to as accurate grounded generation.

The model is trained with a number of other answering modes, which can be selected by prompt changes. A fast citation mode is supported in the tokenizer, which will directly generate an answer with grounding spans in it, without first writing the answer out in full. This sacrifices some grounding accuracy in favor of generating fewer tokens.

Comprehensive documentation for working with command - R's grounded generation prompt template can be found here.

The code snippet below shows a minimal working example on how to render a prompt.

Usage: Rendering Grounded Generation prompts [CLICK TO EXPAND]

from transformers import AutoTokenizer

model_id = "CohereLabs/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)

# define conversation input:
conversation = [
    {"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# define documents to ground on:
documents = [
    { "title": "Tall penguins", "text": "Emperor penguins are the tallest growing up to 122 cm in height." }, 
    { "title": "Penguin habitats", "text": "Emperor penguins only live in Antarctica."}
]

# render the tool use prompt as a string:
grounded_generation_prompt = tokenizer.apply_grounded_generation_template(
    conversation,
    documents=documents,
    citation_mode="accurate", # or "fast"
    tokenize=False,
    add_generation_prompt=True,
)
print(grounded_generation_prompt)

Example Rendered Grounded Generation Prompt [CLICK TO EXPAND]

The instructions```

## 🔧 Technical Details
- **Model Architecture**: An auto - regressive language model using an optimized transformer architecture, followed by supervised fine - tuning (SFT) and preference training.
- **Context Length**: Supports a context length of 128K.
- **Grounded Generation Training**: Trained via a mixture of supervised fine - tuning and preference fine - tuning using a specific prompt template.

## 📄 License
- License: [CC - BY - NC](https://cohere.com/c4ai-cc-by-nc-license), requires also adhering to [Cohere Lab's Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy)

> ⚠️ **Important Note**
> 
> This model is non - quantized version of Cohere Labs Command - R. You can find the quantized version of Cohere Labs Command - R using bitsandbytes [here](https://huggingface.co/CohereLabs/c4ai-command-r-v01-4bit).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご