đ NeoBERT Quantized Model
This project provides static quantizations of the NeoBERT model, offering various quantized versions for efficient usage.
đ Quick Start
If you're new to using GGUF files, check out TheBloke's READMEs for detailed guidance, including how to concatenate multi - part files.
⨠Features
- Static quantizations of the [chandar - lab/NeoBERT](https://huggingface.co/chandar - lab/NeoBERT) model.
- A variety of quantized versions are available, sorted by size.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
No code examples are provided in the original document.
đ Documentation
About
This project offers static quants of [chandar - lab/NeoBERT](https://huggingface.co/chandar - lab/NeoBERT). Currently, weighted/imatrix quants seem unavailable. If they don't appear about a week after the static ones, it might be that they are not planned. You can request them by opening a Community Discussion.
Provided Quants
The provided quantized models are sorted by size (not necessarily quality). IQ - quants are often preferable over similar - sized non - IQ quants.
Link |
Type |
Size/GB |
Notes |
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q2_K.gguf) |
Q2_K |
0.2 |
|
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q3_K_S.gguf) |
Q3_K_S |
0.2 |
|
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q3_K_M.gguf) |
Q3_K_M |
0.2 |
lower quality |
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.IQ4_XS.gguf) |
IQ4_XS |
0.2 |
|
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q3_K_L.gguf) |
Q3_K_L |
0.2 |
|
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q4_K_S.gguf) |
Q4_K_S |
0.2 |
fast, recommended |
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q4_K_M.gguf) |
Q4_K_M |
0.2 |
fast, recommended |
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q5_K_S.gguf) |
Q5_K_S |
0.3 |
|
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q5_K_M.gguf) |
Q5_K_M |
0.3 |
|
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q6_K.gguf) |
Q6_K |
0.3 |
very good quality |
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.Q8_0.gguf) |
Q8_0 |
0.3 |
fast, best quality |
[GGUF](https://huggingface.co/mradermacher/NeoBERT - GGUF/resolve/main/NeoBERT.f16.gguf) |
f16 |
0.5 |
16 bpw, overkill |
Here is a useful graph by ikawrakow comparing some lower - quality quant types (lower is better):

And here are Artefact2's thoughts on the matter:
https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
FAQ / Model Request
For answers to your questions or if you want other models to be quantized, visit Model Requests.
đ§ Technical Details
No technical details are provided in the original document.
đ License
The project is licensed under the MIT license.
đ Thanks
I'm grateful to my company, nethype GmbH, for allowing me to use its servers and upgrading my workstation, enabling this work during my free time.