đ C4AI Command R+ FP8
This project offers the c4ai-command-r-plus model quantized to FP8 by FriendliAI. It significantly boosts inference efficiency while maintaining high accuracy.
⨠Features
- Quantization: The model is quantized to FP8 by FriendliAI, enhancing inference efficiency.
- Compatibility: Compatible with Friendli Container.
- Multilingual Support: Supports multiple languages including English, French, German, Spanish, Italian, Portuguese, Japanese, Korean, Chinese, and Arabic.
đĻ Installation
- Sign up: Sign up for Friendli Suite. You can use Friendli Containers free of charge for four weeks.
- Prepare PAT: Prepare a Personal Access Token following this guide.
- Prepare Secret: Prepare a Friendli Container Secret following this guide.
- Install Hugging Face CLI: Install Hugging Face CLI with
pip install -U "huggingface_hub[cli]"
Preparing Personal Access Token
- Sign in Friendli Suite.
- Go to User Settings > Tokens and click 'Create new token'.
- Save your created token value.
Pulling Friendli Container Image
- Log in to the Docker client using the personal access token:
export FRIENDLI_PAT="YOUR PAT"
docker login registry.friendli.ai -u $YOUR_EMAIL -p $FRIENDLI_PAT
- Pull image:
docker pull registry.friendli.ai/trial
đģ Usage Examples
Running Friendli Container
docker run \
--gpus '"device=0,1,2,3"' \
-p 8000:8000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-e FRIENDLI_CONTAINER_SECRET="YOUR CONTAINER SECRET" \
registry.friendli.ai/trial \
--web-server-port 8000 \
--hf-model-name FriendliAI/c4ai-command-r-plus-fp8 \
--num-devices 4
đ Documentation
Original model card: CohereForAI's C4AI Command R+
C4AI Command R+ is an open weights research release of a 104B billion parameter model with advanced capabilities like Retrieval Augmented Generation (RAG) and tool use. It's a multilingual model optimized for various use cases.
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
Quantized model through bitsandbytes, 8-bit precision
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
Quantized model through bitsandbytes, 4-bit precision
This model is non - quantized version of C4AI Command R+. You can find the quantized version of C4AI Command R+ using bitsandbytes here.
Model Details
- Input: The model takes text as input.
- Output: The model generates text.
- Model Architecture: It's an auto - regressive language model using an optimized transformer architecture. After pretraining, it uses supervised fine - tuning (SFT) and preference training.
- Languages covered: Optimized for English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Simplified Chinese, and Arabic. Pre - training data also included 13 other languages.
- Context length: Supports a context length of 128K.
Evaluations
Tool use & multihop capabilities
Command R+ has conversational tool use capabilities trained via supervised and preference fine - tuning. It takes a conversation and a list of tools as input and generates a json - formatted list of actions. It can recognize a directly_answer
tool.
from transformers import AutoTokenizer
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
conversation = [
{"role": "user", "content": "Whats the biggest penguin in the world?"}
]
tools = [
{
"name": "internet_search",
"description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
"parameter_definitions": {
"query": {
"description": "Query to search the internet with",
"type": 'str',
"required": True
}
}
},
{
'name': "direct"
}
]
đ§ Technical Details
- Model creator: CohereForAI
- Original model: c4ai-command-r-plus
- Quantized by: FriendliAI
- Model type: Text generation
- Library name: transformers
- Supported languages: English, French, German, Spanish, Italian, Portuguese, Japanese, Korean, Chinese, Arabic
đ License
This model is under the CC - BY - NC license and requires adhering to C4AI's Acceptable Use Policy.