NeuralDaredevil-8B-abliterated-GGUF Open-source Model - Multiple quantization types adapt to different hardware requirements

Neuraldaredevil 8B Abliterated GGUF

Developed by bartowski

This is a quantized version of the NeuralDaredevil-8B-abliterated model, providing model files of various quantization types, suitable for users with different hardware conditions and requirements.

Large Language Model Open Source License:Other #Multi-level quantization selection #Low VRAM optimization #High-precision dialogue

Downloads 577

Release Time : 6/5/2024

Model Overview

This model is a quantized version based on NeuralDaredevil-8B-abliterated, offering quantized model files of different qualities and sizes from high to low, making it convenient for users to choose the appropriate version according to their hardware conditions.

Model Features

Multiple quantization types

Provides quantized model files of different qualities and sizes from high to low, such as Q8_0, Q6_K, Q5_K_M, etc., to meet the needs of different users.

Quantization with specific dataset

All quantized models use the `imatrix` option and are quantized using a specific dataset.

Clear prompt format

Provides a clear prompt format for users' convenience.

Model Capabilities

Text generation

Multi-language support

Use Cases

Text generation

Dialogue system

Can be used to build a dialogue system to generate natural language responses.

Content creation

Can be used to generate content such as articles and stories.

🚀 Llamacpp imatrix Quantizations of NeuralDaredevil-8B-abliterated

This project provides Llama.cpp imatrix quantizations of the NeuralDaredevil-8B-abliterated model, offering various quantization options for different performance and quality requirements.

📚 Documentation

Model Information

Model Type: Llamacpp imatrix Quantizations of NeuralDaredevil-8B-abliterated
Training Data: mlabonne/orpo-dpo-mix-40k

Property	Details
Model Type	Llamacpp imatrix Quantizations of NeuralDaredevil-8B-abliterated
Training Data	mlabonne/orpo-dpo-mix-40k

Model Performance

The model has been evaluated on several benchmarks. Here are the results:

AI2 Reasoning Challenge (25-Shot): Normalized accuracy of 69.28%
HellaSwag (10-Shot): Normalized accuracy of 85.05%
MMLU (5-Shot): Accuracy of 69.1%
TruthfulQA (0-shot): mc2 score of 60.0
Winogrande (5-shot): Accuracy of 78.69%
GSM8k (5-shot): Accuracy of 71.8%

You can find more details on the Open LLM Leaderboard.

Quantization Details

The quantizations are created using llama.cpp release b3086. All quants are made using the imatrix option with a dataset from here.

The original model can be found here.

Prompt Format

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Download Options

You can download a specific file from the following table:

Filename	Quant type	File Size	Description
NeuralDaredevil-8B-abliterated-Q8_0.gguf	Q8_0	8.54GB	Extremely high quality, generally unneeded but max available quant.
NeuralDaredevil-8B-abliterated-Q6_K.gguf	Q6_K	6.59GB	Very high quality, near perfect, recommended.
NeuralDaredevil-8B-abliterated-Q5_K_M.gguf	Q5_K_M	5.73GB	High quality, recommended.
NeuralDaredevil-8B-abliterated-Q5_K_S.gguf	Q5_K_S	5.59GB	High quality, recommended.
NeuralDaredevil-8B-abliterated-Q4_K_M.gguf	Q4_K_M	4.92GB	Good quality, uses about 4.83 bits per weight, recommended.
NeuralDaredevil-8B-abliterated-Q4_K_S.gguf	Q4_K_S	4.69GB	Slightly lower quality with more space savings, recommended.
NeuralDaredevil-8B-abliterated-IQ4_XS.gguf	IQ4_XS	4.44GB	Decent quality, smaller than Q4_K_S with similar performance, recommended.
NeuralDaredevil-8B-abliterated-Q3_K_L.gguf	Q3_K_L	4.32GB	Lower quality but usable, good for low RAM availability.
NeuralDaredevil-8B-abliterated-Q3_K_M.gguf	Q3_K_M	4.01GB	Even lower quality.
NeuralDaredevil-8B-abliterated-IQ3_M.gguf	IQ3_M	3.78GB	Medium-low quality, new method with decent performance comparable to Q3_K_M.
NeuralDaredevil-8B-abliterated-Q3_K_S.gguf	Q3_K_S	3.66GB	Low quality, not recommended.
NeuralDaredevil-8B-abliterated-IQ3_XS.gguf	IQ3_XS	3.51GB	Lower quality, new method with decent performance, slightly better than Q3_K_S.
NeuralDaredevil-8B-abliterated-IQ3_XXS.gguf	IQ3_XXS	3.27GB	Lower quality, new method with decent performance, comparable to Q3 quants.
NeuralDaredevil-8B-abliterated-Q2_K.gguf	Q2_K	3.17GB	Very low quality but surprisingly usable.
NeuralDaredevil-8B-abliterated-IQ2_M.gguf	IQ2_M	2.94GB	Very low quality, uses SOTA techniques to also be surprisingly usable.
NeuralDaredevil-8B-abliterated-IQ2_S.gguf	IQ2_S	2.75GB	Very low quality, uses SOTA techniques to be usable.
NeuralDaredevil-8B-abliterated-IQ2_XS.gguf	IQ2_XS	2.60GB	Very low quality, uses SOTA techniques to be usable.

Downloading using huggingface-cli

Installation

First, make sure you have huggingface-cli installed:

pip install -U "huggingface_hub[cli]"

Download a Specific File

You can target the specific file you want:

huggingface-cli download bartowski/NeuralDaredevil-8B-abliterated-GGUF --include "NeuralDaredevil-8B-abliterated-Q4_K_M.gguf" --local-dir ./

Download Split Files

If the model is bigger than 50GB, it will have been split into multiple files. To download them all to a local folder, run:

huggingface-cli download bartowski/NeuralDaredevil-8B-abliterated-GGUF --include "NeuralDaredevil-8B-abliterated-Q8_0.gguf/*" --local-dir NeuralDaredevil-8B-abliterated-Q8_0

You can either specify a new local-dir (e.g., NeuralDaredevil-8B-abliterated-Q8_0) or download them all in place (./).

Which File Should I Choose?

A great write-up with charts showing various performances is provided by Artefact2 here.

Determine Model Size

The first thing to figure out is how big a model you can run. You'll need to determine how much RAM and/or VRAM you have.

Fastest Performance: If you want your model running as fast as possible, aim to fit the whole thing on your GPU's VRAM. Choose a quant with a file size 1-2GB smaller than your GPU's total VRAM.
Maximum Quality: If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then select a quant with a file size 1-2GB smaller than that total.

Choose between 'I-quant' and 'K-quant'

K-quants: If you don't want to think too much, grab one of the K-quants. These are in the format 'QX_K_X', like Q5_K_M.
I-quants: If you're aiming for below Q4 and running cuBLAS (Nvidia) or rocBLAS (AMD), you should consider the I-quants. These are in the format IQX_X, like IQ3_M. They are newer and offer better performance for their size.

The I-quants can also be used on CPU and Apple Metal, but they will be slower than their K-quant equivalents.

⚠️ Important Note

The I-quants are not compatible with Vulcan, which is also for AMD. So, if you have an AMD card, double-check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.

💡 Usage Tip

You can check out the llama.cpp feature matrix for more detailed information.

📄 License

The license for this project is other.

If you want to support the author's work, visit the ko-fi page.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご