The Nxcode - CQ - 7B - orpo - IMat - GGUF open-source model supports multiple quantization types to meet the requirements of different scenarios.

Nxcode CQ 7B Orpo IMat GGUF

Developed by legraphista

This is the Llama.cpp imatrix quantization version of the NTQAI/Nxcode-CQ-7B-orpo model, providing files of various quantization types to meet the needs of different scenarios.

Large Language Model Open Source License:Other #Efficient quantitative reasoning #Multi-precision adaptation #Chat assistant optimization

Downloads 411

Release Time : 6/2/2024

Model Overview

This model is a version of the original model NTQAI/Nxcode-CQ-7B-orpo after Llama.cpp imatrix quantization processing. It supports multiple quantization types and optimizes the model performance.

Model Features

Multiple quantization types

Provide multiple quantization types from 16-bit to 1-bit to meet the needs of different scenarios.

IMatrix optimization

Some quantization files are optimized using the IMatrix dataset, significantly improving the model performance.

Efficient reasoning

Through quantization processing, the model significantly reduces the computational resource requirements while maintaining high performance.

Model Capabilities

Text generation

Dialogue system

Code generation

Use Cases

Dialogue system

Intelligent assistant

Can be used to build an intelligent dialogue assistant to provide a natural and smooth interaction experience.

Code generation

Code completion

Supports code completion and generation to improve development efficiency.

🚀 Nxcode-CQ-7B-orpo-IMat-GGUF

Llama.cpp imatrix quantization of NTQAI/Nxcode-CQ-7B-orpo

This project offers a quantized version of the NTQAI/Nxcode-CQ-7B-orpo model using llama.cpp's imatrix quantization. It provides various quantization types and offers guidance on downloading and inference.

Property	Details
Base Model	NTQAI/Nxcode-CQ-7B-orpo
Inference	false
Library Name	GGUF
License	tongyi-qianwen-research
Pipeline Tag	text-generation
Quantized By	legraphista
Tags	code, quantized, GGUF, imatrix, quantization, imat, imatrix, static, 16bit, 8bit, 6bit, 5bit, 4bit, 3bit, 2bit, 1bit

Original Model: NTQAI/Nxcode-CQ-7B-orpo
Original dtype: BF16 (bfloat16)
Quantized by: llama.cpp b3067
IMatrix dataset: here

Files
Downloading using huggingface-cli
Inference
FAQ
- Why is the IMatrix not applied everywhere?
- How do I merge a split GGUF?

📦 Files

🔍 IMatrix

Status: ✅ Available
Link: here

📋 Common Quants

Filename	Quant type	File Size	Status	Uses IMatrix	Is Split
Nxcode-CQ-7B-orpo.Q8_0.gguf	Q8_0	7.71GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.Q6_K.gguf	Q6_K	6.38GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.Q4_K.gguf	Q4_K	4.74GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q3_K.gguf	Q3_K	3.81GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q2_K.gguf	Q2_K	3.05GB	✅ Available	🌟 IMatrix	📦 No

📋 All Quants

Filename	Quant type	File Size	Status	Uses IMatrix	Is Split
Nxcode-CQ-7B-orpo.BF16.gguf	BF16	14.50GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.FP16.gguf	F16	14.50GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.Q8_0.gguf	Q8_0	7.71GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.Q6_K.gguf	Q6_K	6.38GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.Q5_K.gguf	Q5_K	5.43GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.Q5_K_S.gguf	Q5_K_S	5.15GB	✅ Available	⚪ Static	📦 No
Nxcode-CQ-7B-orpo.Q4_K.gguf	Q4_K	4.74GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q4_K_S.gguf	Q4_K_S	4.41GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ4_NL.gguf	IQ4_NL	4.19GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ4_XS.gguf	IQ4_XS	4.03GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q3_K.gguf	Q3_K	3.81GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q3_K_L.gguf	Q3_K_L	3.99GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q3_K_S.gguf	Q3_K_S	3.50GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ3_M.gguf	IQ3_M	3.61GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ3_S.gguf	IQ3_S	3.51GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ3_XS.gguf	IQ3_XS	3.36GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ3_XXS.gguf	IQ3_XXS	3.23GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q2_K.gguf	Q2_K	3.05GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.Q2_K_S.gguf	Q2_K_S	3.03GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ2_M.gguf	IQ2_M	3.01GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ2_S.gguf	IQ2_S	2.88GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ2_XS.gguf	IQ2_XS	2.77GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ2_XXS.gguf	IQ2_XXS	2.62GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ1_M.gguf	IQ1_M	2.46GB	✅ Available	🌟 IMatrix	📦 No
Nxcode-CQ-7B-orpo.IQ1_S.gguf	IQ1_S	2.36GB	✅ Available	🌟 IMatrix	📦 No

💻 Downloading using huggingface-cli

⚠️ Important Note

If you do not have hugginface-cli installed, you need to install it first.

pip install -U "huggingface_hub[cli]"

Download the specific file you want:

huggingface-cli download legraphista/Nxcode-CQ-7B-orpo-IMat-GGUF --include "Nxcode-CQ-7B-orpo.Q8_0.gguf" --local-dir ./

If the model file is big, it has been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download legraphista/Nxcode-CQ-7B-orpo-IMat-GGUF --include "Nxcode-CQ-7B-orpo.Q8_0/*" --local-dir ./
# see FAQ for merging GGUF's

💻 Inference

💬 Simple chat template

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_prompt}<|im_end|>
<|im_start|>assistant
{assistant_response}<|im_end|>
<|im_start|>user
{next_user_prompt}<|im_end|>

💬 Chat template with system prompt

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{user_prompt}<|im_end|>
<|im_start|>assistant
{assistant_response}<|im_end|>
<|im_start|>user
{next_user_prompt}<|im_end|>

🐪 Llama.cpp

llama.cpp/main -m Nxcode-CQ-7B-orpo.Q8_0.gguf --color -i -p "prompt here (according to the chat template)"

❓ FAQ

Why is the IMatrix not applied everywhere?

According to this investigation, it appears that lower quantizations are the only ones that benefit from the imatrix input (as per hellaswag results).

How do I merge a split GGUF?

Make sure you have gguf-split available
- To get hold of gguf-split, navigate to https://github.com/ggerganov/llama.cpp/releases
- Download the appropriate zip for your system from the latest release
- Unzip the archive and you should be able to find gguf-split
Locate your GGUF chunks folder (ex: Nxcode-CQ-7B-orpo.Q8_0)
Run gguf-split --merge Nxcode-CQ-7B-orpo.Q8_0/Nxcode-CQ-7B-orpo.Q8_0-00001-of-XXXXX.gguf Nxcode-CQ-7B-orpo.Q8_0.gguf
- Make sure to point gguf-split to the first chunk of the split.

Got a suggestion? Ping me @legraphista!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご