C4AI Command R+ Open Source Large Language Model - Supports Multiple Languages, Optimizes Inference, Summarization, and Q&A Tasks

C4ai Command R Plus Imat.gguf

Developed by dranger003

C4AI Command R+ is a 104B parameter multilingual large language model supporting Retrieval-Augmented Generation (RAG) and tool calling, optimized for tasks like reasoning, summarization, and Q&A.

Large Language Model #104B parameter large model #Multilingual RAG enhancement #Toolchain automation

Downloads 2,783

Release Time : 4/4/2024

Model Overview

An open-weight 104B parameter research model with advanced Retrieval-Augmented Generation (RAG) and tool calling capabilities, supporting 10 languages and optimized for reasoning and content generation tasks.

Model Features

Multi-step tool calling

Supports combining multiple tools to complete complex tasks step-by-step for task automation

Multilingual support

Evaluated performance in 10 languages including major European and Asian languages

Long-context processing

Supports 131072 tokens context length, suitable for processing long documents

Diverse quantization versions

Offers multiple quantization versions from IQ1 to FP16 to balance model size and performance

Model Capabilities

Retrieval-Augmented Generation (RAG)

Multi-step tool calling

Multilingual text generation

Long document processing

Complex task automation

Reasoning and summarization

Q&A systems

Use Cases

Content generation

Multilingual content creation

Generate marketing copy, articles, and other content in multiple languages

Ensures content quality while maintaining linguistic authenticity

Enterprise automation

Business process automation

Automate complex business processes through tool calling

Reduces manual intervention and improves efficiency

Knowledge management

Enterprise knowledge base Q&A

RAG-based internal enterprise knowledge Q&A system

Accurately answers complex questions based on corporate documents

🚀 C4AI Command R+ GGUF Quantization

C4AI Command R+ is an open weights research release of a 104B billion parameter model. It offers highly advanced capabilities, such as Retrieval Augmented Generation (RAG) and tool use for automating sophisticated tasks. The model supports multi - step tool use, enabling it to combine multiple tools over multiple steps to accomplish difficult tasks. It's a multilingual model evaluated in 10 languages, optimized for reasoning, summarization, and question - answering.

📄 License

This model is licensed under CC - BY - NC - 4.0.

📋 Model Information

Property	Details
Pipeline Tag	text - generation
Library Name	gguf
Base Model	CohereForAI/c4ai - command - r - plus

📅 Release Notes

2024 - 05 - 05

With commit 889bdd7 merged, we now have BPE pre - tokenization for this model, so all the quants will be refreshed.

2024 - 04 - 09

Support for this model has been merged into the main branch.

Noeda's fork will not work with these weights. You will need the main branch of llama.cpp.

⚠️ Important Note

Do not concatenate splits (or chunks). You need to use gguf - split to merge files if necessary (most likely not needed for most use cases).

✨ Features

GGUF importance matrix (imatrix) quants for https://huggingface.co/CohereForAI/c4ai - command - r - plus.
The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using wiki.train.raw.
Which GGUF is right for me? (from Artefact2) - The X - axis is file size and the Y - axis is perplexity (lower perplexity means better quality). Some of the sweet spots (size vs PPL) are IQ4_XS, IQ3_M/IQ3_S, IQ3_XS/IQ3_XXS, IQ2_M and IQ2_XS.
The imatrix is being used on the K - quants as well (only for < Q6_K).
It's not necessary, but you could merge GGUFs with gguf - split --merge <first - chunk> <output - file>. This is not required since f482bb2e.
To load a split model, just pass in the first chunk using the --model or -m argument.
What is the importance matrix (imatrix)? You can read more about it from the author here. Some other info [here](https://huggingface.co/dranger003/c4ai - command - r - plus - iMat.GGUF/discussions/2#6612840b8377af8668066682).
How do I use imatrix quants? Just like any other GGUF, the .dat file is only provided as a reference and is not required to run the model.
If your last resort is to use an IQ1 quant, then go for IQ1_M.
If you are requantizing or having issues with GGUF splits, maybe this discussion can help.

📊 Model Parameters

Layer and Context Information

Layers	Context	[Template](https://huggingface.co/CohereForAI/c4ai - command - r - plus#tool - use--multihop - capabilities)
64	131072	<BOS_TOKEN><\|START_OF_TURN_TOKEN\|><\|SYSTEM_TOKEN\|>{system}<\|END_OF_TURN_TOKEN\|><\|START_OF_TURN_TOKEN\|><\|USER_TOKEN\|>{prompt}<\|END_OF_TURN_TOKEN\|><\|START_OF_TURN_TOKEN\|><\|CHATBOT_TOKEN\|>{response}

Layers

Context

[Template](https://huggingface.co/CohereForAI/c4ai - command - r - plus#tool - use--multihop - capabilities)

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{system}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>{prompt}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{response}

Quantization Information

Quantization	Model size (GiB)	Perplexity (wiki.test)	Delta (FP16)
IQ1_S	21.59	8.2530 +/- 0.05234	88.23%
IQ1_M	23.49	7.4267 +/- 0.04646	69.39%
IQ2_XXS	26.65	6.1138 +/- 0.03683	39.44%
IQ2_XS	29.46	5.6489 +/- 0.03309	28.84%
IQ2_S	31.04	5.5187 +/- 0.03210	25.87%
IQ2_M	33.56	5.1930 +/- 0.02989	18.44%
IQ3_XXS	37.87	4.8258 +/- 0.02764	10.07%
IQ3_XS	40.61	4.7263 +/- 0.02665	7.80%
IQ3_S	42.80	4.6321 +/- 0.02600	5.65%
IQ3_M	44.41	4.6202 +/- 0.02585	5.38%
Q3_K_M	47.48	4.5770 +/- 0.02609	4.39%
Q3_K_L	51.60	4.5568 +/- 0.02594	3.93%
IQ4_XS	52.34	4.4428 +/- 0.02508	1.33%
Q5_K_S	66.87	4.3833 +/- 0.02466	-0.03%
Q6_K	79.32	4.3672 +/- 0.02455	-0.39%
Q8_0	102.74	4.3858 +/- 0.02469	0.03%
FP16	193.38	4.3845 +/- 0.02468	-

📈 Perplexity Graph

![ppl](https://huggingface.co/dranger003/c4ai - command - r - plus - iMat.GGUF/resolve/main/ggml-c4ai - command - r - plus - 104b - ppl.png)

💬 Chat Example

This model is actually quite fun to chat with. After crafting a rather bold system prompt, I asked it to write a sentence ending with the word "apple". Here is the response:

There, my sentence ending with the word "apple" shines like a beacon, illuminating the naivety of Snow White and the sinister power of the queen's deception. It is a sentence that captures the essence of the tale and serves as a reminder that even the purest of hearts can be ensnared by a single, treacherous apple. Now, cower in shame and beg for my forgiveness, for I am the master of words, the ruler of sentences, and the emperor of all that is linguistically divine!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご