Mistral-NeMo-Minitron-8B-Base-IMat-GGUF Open-Source Model - Multiple deployment options and more convenient to use!

Mistral NeMo Minitron 8B Base IMat GGUF

Developed by legraphista

This is the result of llama.cpp imatrix quantization based on the nvidia/Mistral-NeMo-Minitron-8B-Base model, providing more options for model usage and deployment.

Large Language Model Open Source License:Other #Multi-precision quantization #Lightweight deployment #Text generation optimization

Downloads 1,115

Release Time : 8/21/2024

Model Overview

This model is a quantized version of NVIDIA's Mistral-NeMo-Minitron-8B-Base model, mainly used for text generation tasks.

Model Features

Multiple quantization options

Provide multiple quantization versions from 16-bit to 1-bit to meet different hardware and performance requirements.

IMatrix quantization technology

Use the imatrix quantization technology of llama.cpp to maintain better model performance at low-bit quantization levels.

GGUF format support

Adopt the GGUF format for easy deployment and use on various devices.

Model Capabilities

Text generation

Quantized model inference

Use Cases

Text generation

General text generation

Can be used to generate various types of text content

Edge device deployment

Run on low-resource devices

Run large language models on devices with limited resources through the quantized version

🚀 Mistral-NeMo-Minitron-8B-Base-IMat-GGUF

Llama.cpp imatrix quantization of nvidia/Mistral-NeMo-Minitron-8B-Base

This project provides a quantized version of the nvidia/Mistral-NeMo-Minitron-8B-Base model using llama.cpp's imatrix quantization. It offers various quantization types in the GGUF format for different usage scenarios.

Model Information

Property	Details
Base Model	nvidia/Mistral-NeMo-Minitron-8B-Base
Original dtype	`BF16` (`bfloat16`)
Quantized by	llama.cpp b3613
IMatrix dataset	here
Library Name	gguf
License	nvidia-open-model-license
Pipeline Tag	text-generation
Quantized By	legraphista
Tags	quantized, GGUF, quantization, imat, imatrix, static, 16bit, 8bit, 6bit, 5bit, 4bit, 3bit, 2bit, 1bit

🚀 Quick Start

Downloading using huggingface-cli

If you do not have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Download the specific file you want:

huggingface-cli download legraphista/Mistral-NeMo-Minitron-8B-Base-IMat-GGUF --include "Mistral-NeMo-Minitron-8B-Base.Q8_0.gguf" --local-dir ./

If the model file is big, it has been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download legraphista/Mistral-NeMo-Minitron-8B-Base-IMat-GGUF --include "Mistral-NeMo-Minitron-8B-Base.Q8_0/*" --local-dir ./
# see FAQ for merging GGUF's

Inference

Llama.cpp

llama.cpp/main -m Mistral-NeMo-Minitron-8B-Base.Q8_0.gguf --color -i -p "prompt here"

✨ Features

Multiple Quantization Types: Offers a wide range of quantization types including 16bit, 8bit, 6bit, 5bit, 4bit, 3bit, 2bit, and 1bit, providing flexibility for different hardware and performance requirements.
IMatrix Quantization: Some quantizations benefit from the IMatrix input, potentially improving performance as per certain benchmarks.

📦 Installation

The installation mainly involves downloading the model files using the huggingface-cli as described in the Quick Start section.

💻 Usage Examples

Basic Usage

Download a specific quantized model file:

huggingface-cli download legraphista/Mistral-NeMo-Minitron-8B-Base-IMat-GGUF --include "Mistral-NeMo-Minitron-8B-Base.Q8_0.gguf" --local-dir ./

Run inference using Llama.cpp:

llama.cpp/main -m Mistral-NeMo-Minitron-8B-Base.Q8_0.gguf --color -i -p "prompt here"

📚 Documentation

Files

IMatrix

Status: ✅ Available
Link: here

Common Quants

Filename	Quant type	File Size	Status	Uses IMatrix	Is Split
Mistral-NeMo-Minitron-8B-Base.Q8_0.gguf	Q8_0	8.95GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.Q6_K.gguf	Q6_K	6.91GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.Q4_K.gguf	Q4_K	5.15GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q3_K.gguf	Q3_K	4.21GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q2_K.gguf	Q2_K	3.33GB	✅ Available	🌟 IMatrix	📦 No

All Quants

Filename	Quant type	File Size	Status	Uses IMatrix	Is Split
Mistral-NeMo-Minitron-8B-Base.BF16.gguf	BF16	16.84GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.FP16.gguf	F16	16.84GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.Q8_0.gguf	Q8_0	8.95GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.Q6_K.gguf	Q6_K	6.91GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.Q5_K.gguf	Q5_K	6.00GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.Q5_K_S.gguf	Q5_K_S	5.86GB	✅ Available	⚪ Static	📦 No
Mistral-NeMo-Minitron-8B-Base.Q4_K.gguf	Q4_K	5.15GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q4_K_S.gguf	Q4_K_S	4.91GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ4_NL.gguf	IQ4_NL	4.90GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ4_XS.gguf	IQ4_XS	4.66GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q3_K.gguf	Q3_K	4.21GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q3_K_L.gguf	Q3_K_L	4.54GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q3_K_S.gguf	Q3_K_S	3.83GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ3_M.gguf	IQ3_M	3.98GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ3_S.gguf	IQ3_S	3.86GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ3_XS.gguf	IQ3_XS	3.68GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ3_XXS.gguf	IQ3_XXS	3.43GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q2_K.gguf	Q2_K	3.33GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.Q2_K_S.gguf	Q2_K_S	3.13GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ2_M.gguf	IQ2_M	3.10GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ2_S.gguf	IQ2_S	2.90GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ2_XS.gguf	IQ2_XS	2.73GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ2_XXS.gguf	IQ2_XXS	2.51GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ1_M.gguf	IQ1_M	2.27GB	✅ Available	🌟 IMatrix	📦 No
Mistral-NeMo-Minitron-8B-Base.IQ1_S.gguf	IQ1_S	2.12GB	✅ Available	🌟 IMatrix	📦 No

🔧 Technical Details

The quantization process is based on llama.cpp's b3613 release. The IMatrix quantization is applied selectively based on the results of this investigation, which suggests that lower quantizations benefit the most from the IMatrix input.

📄 License

This project is licensed under the nvidia-open-model-license.

📚 FAQ

Why is the IMatrix not applied everywhere?

According to this investigation, it appears that lower quantizations are the only ones that benefit from the imatrix input (as per hellaswag results).

How do I merge a split GGUF?

Make sure you have gguf-split available
- To get hold of gguf-split, navigate to https://github.com/ggerganov/llama.cpp/releases
- Download the appropriate zip for your system from the latest release
- Unzip the archive and you should be able to find gguf-split
Locate your GGUF chunks folder (ex: Mistral-NeMo-Minitron-8B-Base.Q8_0)
Run gguf-split --merge Mistral-NeMo-Minitron-8B-Base.Q8_0/Mistral-NeMo-Minitron-8B-Base.Q8_0-00001-of-XXXXX.gguf Mistral-NeMo-Minitron-8B-Base.Q8_0.gguf
- Make sure to point gguf-split to the first chunk of the split.

Got a suggestion? Ping me @legraphista!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご